WILEY PUBLICATIONS IN STATISTICS 


Walter А. Shewhart Samucl-S. Wilks 
Editors 


Mathematical Statistics 


ANDERSON · An Introduction to Multivariate Statistical Analysis 
BLACKWELL and GIRSHICK - Theory of Games and Statistical Decisions 
CRAMÉR - The Elements of Probability Theory and Some of Its Applications 
DOOB · Stochastic Processes 
DWYER - Linear Computations 
FELLER - An Introduction to Probability Theory and Its Applications, Volume I, 
Second Edition 
FISHER - Contributions to Mathematical Statistics 
. FRASER - Nonparametric Methods in Statistics 
FRASER · Statistics—An Introduction 
GRENANDER : Probability and Statistics 
GRENANDER and ROSENBLATT - Statistical Analysis of Stationary Time Series 
HANSEN, HURWITZ, and MADOW - Sample Survey Methods and Theory, Volume II 
HOEL - Introduction to Mathematical Statistics, Second Edition 
KEMPTHORNE · The Design and Analysis of Experiments 
KULLBACK · Information Theory and Statistics 
LEHMANN - Testing Statistical Hypotheses 
PARZEN - Modern Probability Theory and Its Applications 
RAO - Advanced Statistical Methods in Biometric Research 
RIORDAN · An Introduction to Combinatorial Analysis 
SAVAGE · Foundations of Statistics 
SCHEFFE - The Analysis of Variance 
WALD · Sequential Analysis 
WALD : Statistical Decision Functions 


Applied Statistics 
ACTON · Analysis of Straight-Line Data 
BENNETT and FRANKLIN - Statistical Analysis in Chemistry and the Chemical Industry 
BUSH and MOSTELLER - Stochastic Models for Learning 
CHEW - Experimental Designs in Industry 
CLARK - An Introduction to Statistics 
COCHRAN - Sampling Techniques 
COCHRAN and COX - Experimental Designs, Second Edition 
CORNELL - The Essentials of Educational Statistics 
COX · Planning of Experiments 
DEMING - Some Theory of Sampling 
DODGE and ROMIG · Sampling Inspection Tables, Second Edition 
FRYER - Elements of Statistics 
GOULDEN - Methods of Statistical Analysis, Second Edition 
HALD · Statistical Tables and Formulas 
HALD - Statistical Theory with Engineering Applications 
HANSEN, HURWITZ, and MADOW - Sample Survey Methods 
HOEL - Elementary Statistics 
KEMPTHORNE - An Introduction to Genetic Statistics 


‚ Volume I 


Applied Statistics (Continued) 


MEYER - Symposium on Monte Carlo Methods 
MUDGETT - Index Numbers 

RICE * Control Charts 

ROMIG - 50-100 Binomial Tables 

TIPPETT - Technological Applications of Statistics 
WILLIAMS - Regression Analysis 

WOLD and JURÉEN - Demand Analysis 
YOUDEN + Statistical Methods for Chemists 


Books of Related Interest 


ALLEN and ELY - International Trade Statistics 

CHERNOFF and MOSES - Elementary Decision Theory 

HAUSER and LEONARD · Government Statistics for Business Use, Second Edition 
STEPHAN and McCARTHY * Sampling Opinions—An Analysis of Survey Procedures 


J Modern Probability Theory 
| and Its Applications 


Modern Probability Theory 


А WILEY PUBLICATION IN MATHEMATICAL STATISTICS 


| 


and [ts Applications 


— PAZEN > 


Associate Professor of Statistics N 
1 nversity 1 


New York - London 


—o d 


| Bureau Еапі. 
і DAVID 


TO 26. 2.5... 
| 402. No.. £5.58 7-- 


COPYRIGHT © 1960 
BY 
Jonn WILEY & Sons, Inc. 


vana 


All Rights Reserved 


This book or any part thereof must not 
be reproduced in any form without the 
written permission of the publisher. 


COPYRIGHT, CANADA, 1960, INTERNATIONAL COPYRIGHT, 1960 
Јонм WILEY & Sons, INC., PROPRIETOR 


All Foreign Rights Reserved 


Reproduction in whole or in part forbidden. 


LIBRARY OF CONGRESS CATALOG CARD NUMBER: 60—6456 


PRINTED IN THE UNITED STATES OF AMERICA 


- 


To the memory 
of my mother and father 


The conception of chance enters into 

the very first steps of scientific activity, 

in virtue of the fact that no observation 

is absolutely correct. 1 think chance is а more 
fundamental conception than causality; for whether in a 
concrete case a cause-effect relation 
holds or not can only be judged by applying the laws 
of chance to the observations. 


MAX BORN 
Natural Philosophy 
of Cause and Chance 


Preface 


The notion of probability, and consequently the mathematical theory 
of probability, has in recent years become of interest to many scientists 
and engineers. There has been an increasing awareness that not *Will 
it work?" but “What is the probability that it will work?" is the proper 
question to ask about an apparatus. Similarly, in investigating the posi- 
tion in space of certain objects, "What is the probability that the object 
is in a given region?" is a more appropriate question than "Is the object 
in the given region?" As a result, the feeling is becoming widespread 
that a basic course in probability theory should be a part of the under- 
graduate training of all scientists, engineers, mathematicians, statisticians, 
and mathematics teachers. 

A basic course in probability theory should serve two ends. 

On the one hand, probability theory is a subject with great charm 
and intrinsic interest of its own, and an appreciation of the fact should 
be communicated to the student. Brief explanations of some of the ideas 
of probability theory are to be found scattered in many books written 
about many diverse subjects. The theory of probability thus presented 
sometimes appears confusing because it seems to be a collection of 
tricks, without an underlying unity. On the contrary, its concepts pos- 
sess meanings of their own that do not depend on particular applica- 
tions. Because of this fact, they provide formal analogies between real 
phenomena, which are themselves totally different but which in certain 
theoretical aspects can be treated similarly. For example, the factors 
affecting the length of the life of a man of a certain age and the factors 

vit 


viii PREFACE 


affecting the time a light bulb will burn may be quite different, yet 
similar mathematical ideas may be used to describe both quantities. 

On the other hand, a course in probability theory should serve as a 
background to many courses (such as statistics, statistical physics, in- 
dustrial engineering, communication engineering, genetics, statistical 
psychology, and econometrics) in which probabilistic ideas and tech- 
niques are employed. Consequently, in the basic course in probabil- 
ity theory one should attempt to provide the student with a confident 
technique for solving probability problems. To solve these problems, 
there is по need to employ intuitive witchcraft. In this book it is shown 
how one may formulate probability problems in a mathematical manner 
so that they may be systematically attacked by routine methods. The 
basic step in this procedure is to express any event whose probability 
of occurrence is being sought as a set of sample descriptions, defined on 
the sample description space of the random phenomenon under con- 
sideration. In a similar spirit, the notion of random variable, together 
with the sometimes bewildering array of notions that must be introduced 
simultaneously, is presented in easy stages by first discussing the notion 
of numerical valued random phenomena. 

This book is written as a textbook for a course in probability that can 
be adapted to the needs of students with diverse interests and back- 
grounds. In particular, it has been my aim to present the major ideas 
of modern probability theory without assuming that the reader knows 
the advanced mathematics necessary for a rigorous discussion. 

The first six chapters constitute a one-quarter course in elementary 


probability theory at the sophomore or junior level. For the study of 
these chapters, the student need have ha 


d only one year of college 
calculus. 


Students with more mathematical background would also 
cover Chapters 7 and 8. The material in the first eight chapters (omit- 
ting the last section in each) can be conveniently covered in thirty-nine 
class hours by students with a good working knowledge of calculus. 


Many of the sections of the book can be read independently of one an- 
other without loss of continuity. 


Chapters 9 and 10 are much 1 
first eight chapters. 
theorems of probability 


ess elementary in character than the 
They constitute an introduction to the limit 
t theory and to the role of characteristic functions 
in probability theory. These chapters provide careful and rigorous 


derivations of the law of large numbers and the central limit theorem 
and contain many new proofs. 


In studying probability theory, 
ing that is undoubtedly novel to h 
he have available а large numbe 


the reader is exploring a way of think- 
im. Consequently, it is important that 
r of interesting problems that at once 


PREFACE ix 


illustrate and test his grasp of the theory. More than 160 examples, 
120 theoretical exercises, and 480 exercises are contained in the text. 
The exercises are divided into two categories and are collected at the 
end of each section rather than at the end of the book or at the end 
of each chapter. The theoretical exercises extend the theory; they are 
stated in the form of assertions that the student is asked to prove. The 
nontheoretical exercises are numerical problems concerning concrete 
random phenomena and illustrate the variety of situations to which 
probability theory may be applied. The answers to odd-numbered 
exercises are given at the end of the book; the answers to even- 
numbered exercises are available in a separate booklet. 

In choosing the notation I have adopted in this book, it has been my 
aim to achieve a symbolism that is self-explanatory and that can be read 
as if it were English. Thus the symbol Fx(x) is defined as “the dis- 
tribution function of the random variable X evaluated at the real num- 
ber x." The terminology adopted agrees, I believe, with that used by 
most recent writers on probability theory. 

The author of a textbook is indebted to almost everyone who has 
touched the field. I especially desire to express my intellectual indebted- 
ness to the authors whose works are cited in the brief literature survey 
given in section 8 of Chapter 1. 

To my colleagues at Stanford, and especially to Professors A. Bowker 
and S. Karlin, 1 owe a great personal debt for the constant encourage- 
ment they have given me and for the stimulating atmosphere they have 
provided. All have contributed much to my understanding of proba- 
bility theory and statistics. 

I am very grateful for the interest and encouragement accorded me 
by various friends and colleagues. I particularly desire to thank Marvin 
Zelen for his valuable suggestions. 

To my students at Stanford who have contributed to this book by 
their comments, I offer my thanks. Particularly valuable assistance has 
been rendered by E. Dalton and D. Ylvisaker and also by M. Boswell 
and P. Williams. 

To the cheerful, hard-working staff of the Applied Mathematics and 
Statistics Laboratory at Stanford, I wish to express my gratitude for 
their encouragement. Great thanks are due also to Mrs. Mary Alice 
McComb and Mrs. Isolde Field for their excellent typing and to Mrs. 
Betty Jo Prine for her excellent drawings. 

EMANUEL PARZEN 

Stanford, California 

January 1960 


Contents 


CHAPTER 


l PROBABILITY THEORY AS THE STUDY OF MATHEMATICAL MODELS 


' OF RANDOM PHENOMENA 


N 


Probability theory as the study of random phenomena 
Probability theory as the study of mathematical models of 


random phenomena 3 
The sample description space of a random "—M 


Events 

The definition of probability as a function of events on 
a sample description space 

Finite sample description spaces 

Finite sample description spaces with equally likely de. 


scriptions 
Notes on the literature of probability theory 


2 BASIC PROBABILITY THEORY . 


ON A RUD ро — 


Samples and n-tuples | 

Posing probability problems mathematically 

The number of “successes” in a sample 

Conditional probability А 

Unordered and partitioned samples—occupancy problems 

The probability of occurrence of a given number of events 
XI 


PAGE 


xii 


CONTENTS 


3 INDEPENDENCE AND DEPENDENCE 


сш шоок 


Independent events and families of events 
Independent trials 

Independent Bernoulli trials 

Dependent trials ә 

Markov dependent Bernoulli trials 
Markov chains 


4  NUMERICAL-VALUED RANDOM PHENOMENA 


1 
2 


м о м ш 


6 NORMAL, POISSON, 


1 


tA RU 


The notion of a numerical-valued random phenomenon 
Specifying the probability law of a numerical-valued ran- 


dom phenomenon 


Appendix: The evaluation of integrals and sums 


Distribution functions 
Probability laws 
The uniform probability law 


The normal distribution and density functions . 
Numerical n-tuple valued random phenomena . 


AND VARIANCE OF A PROBABILITY LAW 
The notion of an average . 
law 


Moment-generating functions . 
Chebyshev's inequality 


The law of large numbers for independent repeated Ber- 


noulli trials . SZ ES 
More about expectation . 


AND RELATED PROBABILITY LAWS . 


The importance of the normal probability law . $ 8 
The approximation of the binomial probability law by the 


normal and Poisson probability laws . 
The Poisson probability law . 


The exponential and gamma probability laws | 


Birth and death processes . 


7 RANDOM VARIABLES . 


1 


The notion of a random variable . 


Expectation of a function with respect to a probability 


N 


ann 


CONTENTS 


Describing a random variable à j 

An example, treated from the point of view of numerical 
n-tuple valued random phenomena Fai А : 
The same example treated from the point of view of ran- 
dom variables . . T . 

Jointly distributed random variables . 

Independent random variables . wd к б х 
Random samples, randomly chosen points (geometrical 
probability), and random division of an interval . 

The probability law of a function of a random variable . 
The probability law of a function of random variables 
The joint probability law of functions of random variables 
Conditional probability of an event given a random vari- 
able. Conditional distributions. 


8 EXPECTATION OF A RANDOM VARIABLE 


суш шоро к 


Expectation, mean, and variance of a random variable . 
Expectations of jointly distributed random variables 
Uncorrelated and independent random variables . 
Expectations of sums of random variables . c + 
The law of large numbers and the central limit theorem 
The measurement signal-to-noise ratio of a random var- 


рЫ 2а засв ШЗ утаа о 
Conditional expectation. Best linear prediction 


9 SUMS OF INDEPENDENT RANDOM VARIABLES . 


ш о 


The problem of addition of independent random variables 


The characteristic function of a random variable . 
The characteristic function of a random variable specifies 


its probability law 
Solution of the pro 


random variables by t 
Proofs of the inversion formulas for ch 


blem of the addition of independent 
he method of characteristic functions 
aracteristic func- 


tions . 


10 SEQUENCES OF RANDOM VARIABLES . 


1 


Modes of convergence of a sequence of random variables 


The law of large numbers . 
Convergence in distribution О 


iables 


f a sequence of random var- 


xiv CONTENTS 


4 The central limit theorem 
5 Proofs of theorems concerning convergence in distribution 


Tables 


Answers to Odd-Numbered Exercises 


Index 


430 
434 


441 
447 
459 


List of Important Tables 


TABLE 
2-6\ THE PROBABILITIES OF VARIOUS EVE 
OCCUPANCY AND SAMPLING PROBLEMS . 


STS DEFINED ON THE GENERAL 


DISCRETE PROBABILITY LAWS 


5-31 SOME FREQUENTLY ENCOUNTERED 
CTIONS 


AND THEIR MOMENTS AND GENERATING 


ENCOUNTERED CONTINUOUS PROBABILITY 


5-3B SOME FREQUENTLY 
SERATING FUNCTIONS 


LAWS AND THEIR MOMENTS AND G 


8-6\ MEASUREMENT SIGNAL TO NOISE RATIO OF RANDOM VARIABLES 


OBEYING VARIOUS PROBABILITY LAWS . 


I AREA UNDER THE NORMAL DENSITY FUNCTIO? 


2 
—L[ ed. 


$6) = 
n = 
BABILITIES; A TABLE OF (C (1 — p)"~*, FOR 


п = 1,2,..., 10, AND VARIOUS VALUES OF Р 


II BINOMIAL PRO 


„ p-^\T/x! F Я " 
III POISSON PROBABILITIES; А TABLE or e^ M/x!, FOR VARIOUS 


VALUES OF A. 


ху 


PAGE 


84 


220 


380 


441 


444 


: 


CHAPTER 1 


Probability Theory 


as the Study 
of Mathematical Models 


of Random Phenomena 


The purpose of this chapter is to discuss the nature of probability theory. 


In section 1 we point out {һе existence of a certain body of phenomena 


that may be called random. In section 2 we state the view, which is adopted 
in this book, that probability theory is the study of mathematical models 
* of random phenomena. The language and notions that are used to formu- 


late mathematical models are discussed in sections 3 to 7. 


1. PROBABILITY THEORY AS THE STUDY OF 


RANDOM PHENOMENA 


of the present day is the steadily 


One of the most striking features | t 
bility theory in a wide variety of 


increasing use of the ideas of proba r ty 
scientific fields, involving matters a5 remote and different as the prediction 


by geneticists of the relative frequency with which various characteristics 

Occur in groups of individuals, the calculation by telephone engineers of 

the density of telephone traffic, the maintenance by industrial engineers of 

manufactured products at а certain standard of quality, the transmission 
1 


2 FOUNDATIONS OF PROBABILITY THEORY сн. 1 


(by engineers concerned with the design of communications 1^ — 
control systems) of signals in the presence of noise, and the study s 
physicists of thermal noise in electric circuits and the Brownian Pd 

particles immersed in a liquid or gas. What is it that is studied in ыш 
bility theory that enables it to have such diverse applications? In M it 
to answer this question, we must first define the property that is possesse 

in common by phenomena such as the number of individuals pesce 
a certain genetical characteristic, the number of telephone calls made in a 
given city between given hours of the day, the standard of quality of the 
items manufactured by a certain process, the number of automobile 
accidents each day on a given highway, and so оп. Each of these phenom- 


ena may often be considered a random phenomenon in the sense of the 
following definition. 


А random (or chance) phenomenon is an 
terized by the property that its observ 
Stances does not always lead to the same observed outcome (so that there 
is no deterministic regularity) but rather to different outcomes in such a 
way that there is statistical regularity. By this is meant that numbers exist 
between 0 and | that Tepresent the relative frequency with which the 
different possible outcomes may be observed in a series of observations of 
independent occurrences of the phenomenon. 

Closely related to the notion of à random phenomenon are the notions 
of a random event and of the probability of a random event. A random 


empirical phenomenon charac- 
ation under a given set of circum- 


occur, approaches a stable 
increased to infinity; 
probability of the random event. 

In order to bring out in more detail what is meant by a random phenom- 
enon, let us consider a typical random event; namely, an automobile 
accident. 


It is evident that just where, when, and how a particular 
accident takes place depends on an enor 


change in any one of which could 


value of the relative frequency is called the 


erent turn of the steering wheel might have 
prevented the accident alto 


! gether or changed its character completely, 
either for the better or for the worse. For any motorist starting out on a 
given highway it cannot be predi 


SEC. 
1 RANDOM PHENOMENA 3 


an automobile accident. Nevertheless, if we observe all (or merely some 
E = аас starting out on this highway on a 
А etë Е " б 
ср биик жү енн ера g, 
may adopt the belief that what 1 ppe oe ж we ah. 
oa : hat happens to a motorist driving on this high- 
"s. от от phenomenon and that the event of his having an automo- 
ent is a random event. 
Р Another typical Tandom phenomenon arises when we consider the 
Xperiment of drawing a ball from an urn. In particular, let us examine an 
urn (or a bowl) containing six balls, of which four are white, and two are 
red, Except for color, the balls are identical in every detail. Let a ball 
be drawn and its color noted. We might be tempted to ask “what will be 
the color of a ball drawn from the urn?" However, it is clear that there is 
no answer to this question. If one actually performs the experiment of 
drawing a ball from an urn, such as the one described, the color of the ball 
one draws will sometimes be white and sometimes red. Thus the outcome 
of the experiment of drawing a ball is unpredictable. 
Yet there are things that are predictable about this experiment. In 


Table 1A the results of 600 independent trials are given (that is, we have 


TABLE 1А 


" The number of white balls drawn in 600 trials of the experiment of drawing 
a ball from an urn containing four white balls and two red balls. 


In Trials Number of White — In Trials Proportion of White 
Numbered Balls Drawn Numbered Balls Drawn 
1-100 69 1-100 0.690 
101-200 70 1-200 0.695 
201-300 59 1-300 0.660 
301-400 63 1-400 0.653 
401-500 76 1-500 0.674 
501-600 64 1-600 0.668 


taken an urn containing four white balls and two red balls, mixed the balls 


Well, drawn a ball, and noted its color, after which the ball drawn was 
; were repeated 600 times). It is seen 


Teturned t : 
o the urn; these operations : : 
that in each block of 100 trials (as well as in the entire set of 600 trials) the 
Proportion of experiments in which a white ball is drawn 1s approximately 


4 FOUNDATIONS OF PROBABILITY THEORY сн. 1 


equal to $. Consequently, one may be tempted to assert that the propor- 
tion $ has some real significance for this experiment and that in a reasonably 
long series of trials of the experiment $ of the balls drawn will be colored 
white. If one succumbs to this temptation, then one has asserted that the 
outcome of the experiment (of drawing a ball from an urn containing six 
balls, of which four are white and two are red) is a random phenomenon. 
More generally, if one believes that the experiment of drawing a ball 
from an urn will, in a long series of trials, yield a white ball in some definite 
proportion (which one may not know) of the trials of the experiment, then 
one has asserted (i) that the drawing of a ball from such an urn is a random 
phenomenon and (ii) that the drawing of a white ball is a random event. 
Let us give an illustration of the way in which one may use the know- 
ledge (or belief) that a phenomenon is random. Consider a group of 
300 persons who are candidates for admission to a certain school at which 
there are facilities for only 200 students. In the interest of fairness it is 
decided to use a random mechanism to choose the students from among 
the candidates. In one possible random method the 300 candidates are 
assembled in a room. Each candidate draws a ball from an urn containing 
six balls, of which four are white; those who draw white balls are admitted 
as students. Given an individual student, it cannot be foretold whether or 
not he will be admitted by this method of selection. Yet, if we believe that 
the outcome of the experiment of drawing a ball possesses the property of 
statistical regularity, then on the basis of the experiment represented by 
Table 1A, which indicates that the probability of drawing a white ball is 
‚ We believe that the number of candidates who will draw white balls, and 
consequently be admitted as students, will be approximately equal to 200 
(note that 200 represents the product of (i) the number of trials of the 
experiment and (ii) the probability of the event that the experiment will 
yield a white ball). By a more careful analysis, one can show that the 


probability is quite high that the number of candidates who will draw white 
balls is between 186 and 214. 


One of the aims of this book is 
theory the same mathematical 
different problems. 
foregoing problem 
that only a certai 


to show how by means of probability 
procedure can be used to solve quite 
To illustrate this point, we consider a variation of the 
Which is of great practical interest. Many colleges find 
n proportion of the students they admit as students 
actually enroll. Consequently a college must decide how many students 
to admit in order to be sure that enough students will enroll. 

a college finds that only two-thirds of the students it admits enroll; one 


may then Say that the probability is 2 that a student will enroll. If the 
college desires to ensure that about 200 students will enroll, it should admit 
300 students. , 


Suppose that 


SEC. 2 MATHEMATICAL MODELS OF RANDOM PHENOMENA 5 


EXERCISES 


1.1. Give an example of a random phenomenon that would be studied by 
(i) a physicist, (ii) a geneticist, (iii) a traffic engineer, (iv) a quality-control 
engineer, (v) a communications engineer, (vi) an economist, (vii) a 
psychologist, (viii) a sociologist, (ix) an epidemiologist, (x) a medical 
researcher, (xi) an educator, (xii) an executive of a television broadcasting 
company. 

1.2. The Statistical Abstract of the United States (1957 edition, p. 57) reports that 
among the several million babies born in the United States the number of 
boys born per 1000 girls was as follows for the years listed: 


Male Births per 
Year 1000 Female Births 


1935 1053 
1940 1054 
1945 1055 
1950 1054 
1951 1052 
1952 1051 
1953 1053 
1954 1051 
1955 1051 
ee —=—=——=————=——= 


hat a newborn baby is a boy is a random event? 


Would you say the event t 1 
ndomevent? Explain your reasoning. 


If so, what is the probability of this ra 

1.3. A discussion question. Describe how you would explain to a layman the 
meaning of the following statement: An insurance company 1s not gambling 
with its clients because it knows with sufficient accuracy what will happen 

d or a million people even when the 


ten thousan' lion Į 
individual among them. 


to every thousand or 
hat will happen to any 1 


company cannot tell w. 


2. PROBABILITY THEORY AS THE STUDY OF 
MATHEMATICAL MODELS OF 
RANDOM PHENOMENA 


One view that one may take about the nature of probability theory is 
that it is part of the study of nature inthe same way that physics, chemistry, 
and biology are. Physics, chemistry, and biology may each be defined as 
the study of certain observable phenomena, which we may call, respectively, 


6 FOUNDATIONS OF PROBABILITY THEORY cH. 1 


the physical, chemical, and biological phenomena. Similarly, one might be 
tempted to define probability theory as the study of certain observable 
phenomena, namely the random phenomena. However, a random 
phenomenon is generally also a phenomenon of some other type; itisa 
random physical phenomenon, or a random chemical phenomenon, and 
so on. Consequently, it would seem overly ambitious for researchers in 
probability theory to take as their province of research all random 
phenomena. In this book we take the view that probability theory is 
not directly concerned with the study of random phenomena but rather 
with the study of the methods of thinking that can be used in the 
study of random phenomena. More precisely, we make the following 
definition. 

The theory of probability is concerned with the study of those methods of 
analysis that are common to the study of random phenomena in all the fields in 
which they arise. Probability theory is thus the study of the study of 
random phenomena, in the sense that it is concerned with those properties 
of random phenomena that depend essentially on the notion of random- 
ness and not on any other aspects of the phenomenon considered. More 
fundamentally, the notions of randomness, of a random phenomenon, of 
statistical regularity, and of "probability" cannot be said to be obvious 
or intuitive. Consequently, one of the main aims of a study of the theory 
of probability is to clarify the meaning of these notions and to provide us 
with an understanding of them, in much the same way that the study of 
arithmetic enables us to count concrete objects and the study of electro- 
magnetic wave theory enables us to transmit messages by wireless. 

We regard probability theory as a part of mathematics. As is the case 
with all parts of mathematics, probability theory is constructed by means 
of the axiomatic method. One begins with certain undefined concepts. 
One then makes certain statements about the properties possessed by, and 
the relations between, these concepts. These statements are called the 
axioms of the theory. Then, by means of logical deduction, without any 
appeal to experience, various propositions (called theorems) are obtained 
from the axioms. Although the propositions do not refer directly to 
the real world, but are merely logical consequences of the axioms, 


they do represent conclusions about real phenomena, namely those real 
phenomena one is 


T willing to assume possess the properties postulated in 
the axioms. 


We are thus led to the notion of a mathematical model of a real phenom- 
enon. A mathematical theory cons 


"plns tructed by the axiomatic method is 
sai tol € a model of a rea] phenomenon, if one gives a rule for translating 
propositions of the mathematical theory into propositions about the real 
phenomenon. Thi 


S definition is vague, for it does not state the character 


SEC. 2 MATHEMATICAL MODELS OF RANDOM PHENOMENA d 


of the rules of translation one must employ. However, the foregoing 
definition is not meant to be a precise one but only to give the reader an 
intuitive understanding of the notion of a mathematical model. Generally 
speaking, to use a mathematical theory as a model for a real phenomenon, 
one needs only to givea rule for identifying the abstract objects about which 
the axioms of the mathematical theory speak with aspects of the real 
phenomenon. It is then expected that the theorems of the theory will 
depict the phenomenon to the same extent that the axioms do, for the 
theorems are merely logical consequences of the axioms. 

As an example of the problem of building models for real phenomena, 
let us consider the problem of constructing a mathematical theory (or 
explanation) of the experience recorded in Table 1А, which led us to 
believe that a long series of trials (of the experiment of drawing a ball from 
an urn containing six balls, of which four are white and two red) would 
yield a white ball in approximately 3 of the trials. In the remainder of this 
chapter we shall construct a mathematical theory of this phenomenon, 
which we believe to be a satisfactory model of certain features of it. It may 
clarify the ideas involved, however, if we consider here an explanation of 
this phenomenon, which we shall then criticize. | ; 

We imagine that we are permitted to label the six balls in the urn with 
numbers 1 to 6, labeling the four white balls with numbers 1 to 4. When a 
ball is drawn from the urn, there are six possible outcomes that can be 
recorded; namely, that ball number 1 was drawn, that ball number 2 was 
drawn, etc. Now four of these outcomes correspond to the outcome that a 
white ball is drawn. Therefore the ratio of the number of outcomes of the 
experiment favorable to a white ball being drawn to the number of all 
possible outcomes is equal to 5. Consequently, in order to explain" why 
the observed relative frequency of the drawing of a white ball from the 
urn is equal to 3, one need only adopt this assumption (stated rather 
informally): the probability of an event (by which is meant the s 
frequency with which an event, such as the drawing of a white all, is 
Observed to occur in a long series of trials of some experiment) is equal to 
the ratio of the number of outcomes of the experiment in which the event 


may be observed to the number of all possible outcomes of the experiment. 
vhich one may criticize the foregoing 


verá ounds on v 
aim peres may state that it is not mathematical, since it does 
Not possess a structure of axioms and theorems. This defect may wy 
be remedied by using the tools that we develop in the «cm er ip |, 
Chapter; consequently, we shall not press this criticism. poc he : 
is a second defect in the explanation that cannot be repaired. Tha ies 
tion stated, that the probability of an event is equal to a certain н "ie 
not lead to an explanation of the observed phenomenon because by counting 


8 FOUNDATIONS OF PROBABILITY THEORY cH. 1 


in different ways one can obtain different values for the ratio. We have 
already obtained a value of å for the ratio; we next obtain a value of 2. 
If one argues that there are merely two outcomes (either a white ball or a 
nonwhite ball is drawn), then exactly one of these outcomes is favorable 
to a white ball being drawn. Therefore, the ratio of the number of 
outcomes favorable to a white ball being drawn to the number of possible 
outcomes is 2. 

Wenow proceed to develop the mathematical tools we require to construct 
satisfactory models of random phenomena. 


3. THE SAMPLE DESCRIPTION SPACE OF A 
RANDOM PHENOMENON 


It has been stated that probability theory is the study of mathematical 
models of random phenomena; in other words, probability theory is 
concerned with the statements one can make about a random phenomenon 
about which one has postulated certain properties. The question im- 
mediately arises: how does one formulate postulates concerning a random 
phenomenon? This is done by introducing the sample description space of 
the random phenomenon. 

The sample description space of a random phenomenon, usually denoted 
by the latter S, is the space of descriptions of all possible outcomes of the 
phenomenon. 

To be more specific, suppose that one is performing an experiment or 
observing à phenomenon. For example, one may be tossing a coin, or 
two coins, or 100 coins; or one may be measuring the height of people, or 
both their height and weight, or their height, weight, waist size, and chest 
Size; or one may be measuring and recording the voltage across a circuit 
at one point of time, or at two points of time, or for a whole interval of 
time (by photographing the effect of the voltage upon an oscilloscope). 
In all these cases one can imagine a space that consists of all possible 
descriptions of the outcome of the experiment or observation. We call it 
the sample description space, since the outcome of an experiment Or 


с is usually called a sample. Thus а sample is something that 
has been observed; a sample description is the name of something that 
is observable. 


А remark may be in order on the use of the word *space." The reader 
should not confuse the notion of space as used in this book with the use of 
the word space to denote certain parts of the world we live in, such as the 
region between planets. A notion of great importance in modern mathe- 
matics, since it is the starting point of all mathematical theories, is the 


SEC. 3 THE SAMPLE DESCRIPTION SPACE 9 


notion of a set. A set is a collection of objects (either concrete objects, 
such as books, cities, and people, or abstract objects, such as numbers, 
letters, and words). A set that is in some sense complete, so that only those 
objects in the set are to be considered, is called a space. In developing any 
mathematical theory, one has first to define the class of things with which 
the theory will deal; such a class of things, which represents the universe 
of discourse, is called a space. A space has neither dimension nor volume; 
rather, a space is a complete collection of objects. 

Techniques for the construction of the sample description space of a 
random phenomenon are systematically discussed in Chapter 2. For the 
present, to give the reader some idea of what sample description spaces 
look like, we consider a few simple examples. 

Suppose one is drawing a ball from an urn containing six balls, of which 
four are white and two are гей. The possible outcomes of the draw may be 
denoted by W and R, and we write W or R accordingly, as the ball drawn 
is white or red. In symbols, we write $ = (IW, R}. On the other hand, we 
may regard the balls as numbered 1 to 6; then we write S = (1, 2, 3, 4, 5, 6} 
to indicate that the possible outcome of a draw is a number, 1 to 6. 

Next, let us suppose that one draws two balls from an urn containing 
six balls, numbered 1 to 6. We shall need a notation for recording the 
s. Suppose that the first ball drawn bears number 


outcome of the two draw c 
we write that the outcome 


5 and the second ball drawn bears number 35 
of the two draws is (5, 3). The object (5, 3) is called a 2-tuple. We assume 


that the balls are drawn one at a time and that the order in which the balls 
are drawn matters. Then (3, 5) represents the outcome that first ball 3 
and then ball 5 were drawn. Further, (3, 3) and (5, 3) represent different 
possible outcomes. In terms of this notation, the sample description space 
of the experiment of drawing two balls from an urn containing balls 
numbered 1 to 6 (assuming that the balls are drawn in order and that the 
ball drawn on the first draw is not returned to the urn before the second 
draw is made) has 30 members: 
(3.1) S = {(1,2), (1.3); (1, 4), (0.5); (1,6) 

2,0, 23) Q9 G 5, (2,6) 

(3,1), G2. (6.4), (3, 5), (3,6) 

(4D, (42, (3) (4 5), (46 

(5,1), (52, (G3 (5,4), ($6 

(61, (62. (63. © 4, (6,5) 
example that involves the measurement of numeri- 
cal quantities. Suppose one is observing the ages (in years) of couples who 


apply for marriage licenses in a certain city. We adopt the following 
notation to record the outcome of the observation. Suppose one has 


We next consider an 


10 FOUNDATIONS OF PROBABILITY THEORY CH. 1 


observed a man and a woman (applying for a marriage license) whose 
ages are 24 and 22, respectively: we record this observation by writing the 
2-tuple (24, 22). Similarly, (18, 80) represents the age of a couple in 
which the man's age is 18 and the woman's age is 80. Now let us suppose 
that the age (in years) at which a man or a woman may get married is any 
number, 1 to 200. It is clear that the number of possible outcomes of the 
observation of the ages of a marrying couple is too many to be conveniently 
listed; indeed, there are (200)(200) = 40,000 possible outcomes! One 
thus sees that it is often more convenient to describe, rather than to list, 
the sample descriptions that constitute the sample description space 5. 
To describe S in the example at hand, we write 


(3.2) 5 = (2-tuples (x, y): x is any integer, 1 to 200, 
y is any integer, | to 200}. 


We have the following notation for forming sets. We draw two braces 
to indicate that a set is being defined. Next, we can define the set cither by 
listing its members (for example, 5 = {W, R} and S = {1, 2, 3, 4, 5, 6}) or 
by describing its members, as in (3.2). When the latter method is used, а 
colon will always appear between the braces. On the left side of the colon, 
one will describe objects of some general kind; on the right side of the 


colon, one will specify a property that these objects must have in order to 
belong to the set being defined. 


АП of the sample description spaces so far considered have been of 
finite size.* However, there is no logical necessity for a sample description 
Space to be finite, Indeed, there are many important problems that require 
sample description spaces of infinite size. We briefly mention two examples. 
Suppose that we are observing a Geiger counter set up to record cosmic-ray 


* Given any set A of objects of any kind, the size of A is defined as the number of 
е of A. Sets are said to be of finite size if their size is one of the finite numbers 
1.2/3... [ 


| ‚ Examples of sets of finite size are the following: the set of all the continents 
in the world, which has size 7; the set of all the planets in the universe, which has size 9; 
the set (1,2, 3, 5, 7, 11, 13} of all prime numbers from 1 to 15, which has size 7; the set 


(0, 4), (2, 3), (3, 2), (4, 1)} of 2-tuples of whole numbers between 1 and 6 whose sum is 
5, which has size 4, 


However, there are also sets of infinite (that is, nonfinite) size, Examples are the set of 


- апа the set of all points on the real line 
the interval between O and 1. Ifa set A has as many 
members as there are integers 1, 2, EN И (by which is meant that a one-to-one 
11, 2,3... 3} Of ай integers) then А is said to bi 
integers {2, 4,6,8.. -} contains a countable infini 
integers {1,3,5,...} and the Set of primes. 
infinite is said to be noncountably infinite. 

between 0 and l, contains a noncountable 


e countably infinite. The set of even 
ly of members, as does the set of odd 
A set that is neither finite nor countably 
An interval on the real line, say the interval 
infinity of members, 


- —— a 


SEC. 4 EVENTS 11 


counts. The number of counts recorded may be any integer. Consequently, 
as the sample description space S we would adopt the set (1, 2, 3,...} 
of all positive integers. Next, suppose we were measuring the time (in 
microseconds) between two neighboring peaks on an electrocardiogram 
or some other wiggly record; then we might take the set S = {real 
numbers ж: 0 -< .« < 00} of all positive real numbers as our sample 
description space. 

It should be pointed out that the sample description space of a random 
phenomena is capable of being defined in more than one way. Observers 
with different conceptions of what could possibly be observed will arrive 
at different sample description spaces. For example, suppose one is 
tossing a single coin. The sample description space might consist of two 
members, which we denote by H (for heads) and 7 (for tails). In symbols, 
S = (H, Т). However, the sample description space might consist of three 
members, if we desired to include the possibility that the coin might stand 
on its edge or rim. Then S = {H, T, R}, in which the description R 
represents the possibility of the coin standing on its rim. There is yet a 
fourth possibility; the coin might be lost by being tossed out of sight or 
by rolling away when it lands. The sample description space would then 
be S = (H, T, R, Lj, in which the description Z denotes the possibility of 
loss. 

Insofar as probability theory is the study of mathematical models of 
random phenomena, it cannot give rules for the construction of sample 
description spaces. Rather the sample description space of a random 
phenomenon is one of the undefined concepts with which the mathematical 
theory begins. The considerations by which one chooses the correct sample 
description space to describe a random phenomenon are a part of the art 
of applying the mathematical theory of probability to the study of the real 


world. 


4. EVENTS 


The notion of the sample description space of a random phenomenon 
derives its importance from the fact that it provides a means to define the 
notion of an event. 

Let us first consider what is intuitively meant by an event. Let us 
consider an urn containing six balls, of which two are white. Let the balls 
be numbered | to 6, the white balls being numbered 1 to 2. Let two balls 
be drawn from the urn, one after the other; the first ball drawn is not 
returned to the urn before the second ball is drawn. The sample description 
experiment is given by (3.1). Now some possible events 


space S of this s | si 
drawn on the first draw is white, (ii) the event 


are (i) the event that the ball 


12 FOUNDATIONS OF PROBABILITY THEORY cH. 1 


that the ball drawn on the second draw is white, (iii) the event that both 
balls drawn are white, (iv) the event that the sum of the numbers on the 
balls drawn is 7, (v) the event that the sum of the numbers on the balls 
drawn is less than or equal to 4. 

The mathematical formulation that we shall give of the notion of an 
event depends on the following fact. For each of the events just described 
there is a set of descriptions such that the event occurs if and only if the 
observed outcome of the two draws has a description that lies in the set. 
For example, the event that the ball drawn on the first draw is white can 
be reformulated as the event that the description of the outcome of the 
experiment belongs to the set ((1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2,1), (2; 3), 
(2, 4), Q, 5), (2, 6)). Similarly, events (ii) to (v) described above may be 
reformulated as the events that the description of the outcome of the 
experiment belongs to the set (ii) {(2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (1, 2), 
(3, 2), (4, 2), (5, 2), (6, D}, (iii) ((1, 2), (2, D} iv) (1, 6), (2, 5), (3, 4), (4, 3), 
(5, 2), (6, 1)}, (v) ((1, 2), (2, 1), (1, 3), (3, 1)}. 

Consequently, we define an event as a set of descriptions. To say that an 
event E has occurred is to say that the outcome of the random situation under 
consideration has a description that is a member of E. Note that there are 
two notions being defined here, the notion of “an event" and the notion of 
“the occurrence of an event." The first notion represents a basic tool for 
the construction of mathematical models of random phenomena; the 
second notion is the basis of all translations of statements made in the 
mathematical model into statements about the real phenomenon. 

An alternate way in which the definition of an event may be phrased is 
in terms of the notion of subset. Consider two sets, E and F, of objects of 
any kind. We say that E is a subset of F, denoted E © F, if every member 
of the set E is also a member of the set F. We now define an event as any 
subset of the sample description space S. In particular, the sample descrip- 
tion space S is a subset of itself and is thus an event. We call the sample 
description space S the certain event, since by the method of construction 
of S it will always occur. 

It is to be emphasized that in studying a random phenomenon our 
interest is in the events that can occur (or more precisely, in the probabilities 
with which they can occur). The sample description space is of interest 
not for the sake of its members, which are the descriptions, but for the 
sake of its subsets, which are the events! 

We next consider the relations that can exist among events and the 
operations that can be performed on events. One can perform on events 
algebraic operations similar to those of addition and multiplication that 
one can perform on ordinary numbers. The concepts to be presented in 
the remainder of this section may be called the algebra of events. If one 


ae: 4 EVENTS B 


speaks of sets rather than of events, then the concepts of this section 
constitute what is called set theory. 

Given any event Æ, it is as natural to ask for the probability that E will 
not occur as it is to ask for the probability that E will occur. Thus, to any 
event Е, there is an event denoted by Е and called the complement of E 
(ог E complement). The event £“ is the event that £ does not occur and 
consists of all descriptions in S which are not in E. 

Let us next consider two events, £ and F. We may ask whether £ and F 
both occurred or whether at least one of them (and possibly both) occurred. 
Thus we are led to define the events EF and Е U F, called, respectively, the 
intersection and union of the events E and F. 

The intersection EF is defined as consisting of the descriptions that belong 
to both E and F; consequently, the event EF is said to occur if and only if 
both E and F occur, which is to say that the observed outcome has a 
description that is a member of both £ and F. u 

The union E U F is defined as consisting of the descriptions that belong 
to at least one of the events E and F; consequently, the event E U Fis said 
to occur if and only if either £ or F occurs, which is to say that the observed 
outcome has a description that is a member of either E or F (or of both). 

It should be noted that many writers denote the intersection of two 
events by E N F rather than by EF. — ae В 

We may give а symbolic representation of these operations in а diagram 
called a Venn diagram (Figs. 4A to 4C). Let the sample description space 
S be represented by the interior ofa rectangle in the plane; let the event E 
be represented by the interior ofa circle that lies within the rectangle; and 
let the event F be represented by the interior of a square also lying within 
the rectangle (but not necessarily overlapping the а although zi 
Fig. 4B it is drawn that way). Then Е, the mar ee B ege 
in Fig. 4A by the points within the rectangle outsi 2n he circle; tex > E e 
intersection of E and F, is represented in Fig. 4B by Um points wit ost ne 
circle and the square; E U F, the union of Е and. Ty di neges ied. 
Fig. 4C by the points lying within the circle x the Agent -—— 

As another illustration of the notions of the comp rd „шешле: БЕ 
intersection of events, let us consider the сез x bd 
from an urn containing че 2S 4.5 бапа F = (4,5,6,7, 8,9) 
(1,2,...,12). Considerevents £ = jd {4,5,6} and EUF- 
Then Е = {7,8,9, 10, 11, 12}, 395 


евон f ts is to establish the 
sepes i calculus of events is to 
ыл, m D ifferent ways. Two events E and F 


equality vents defined in two d 

i t in tv мау: А 
q lit of vi , | at y A ; а > 
are said to be eq al, WI en E = Е, if ever descr ption in one event be ongs 


to the other. The definition of equality of two events may also be phrased 
eo А 


14 FOUNDATIONS OF PROBABILITY THEORY cH. 1 


in terms of the notion of subevent. An event E is said to bea subevent ofan 
event F, written E © F, if the occurrence of E necessarily implies the 
occurrence of F. In order for this to be true, every description in E must 
belong also to F, so E is a subevent of F if and only if E is a subset of F. 


EF 
E 
hd 
L 
A B 
E 
" н 
Е 
C D 


Fig. 4A. A Venn diagram. The shaded area represents E*. 
Fig. 4B. A Venn diagram. The shaded area represents EF. 


Fig. 4C. A Venn diagram. The shaded area represents £ U F. 
Fig. 4D. A Venn diagram. The shaded area (or rather the lack of a shaded area) 
represents the impossible event 


0, which is the intersection of the two mutually exclusive 
events E and F, 


We then have the basic 
subevent of F and F is a 


(4.1) Е=Е 


Тһе Interesting question arises whether the operations of event union 
and event intersection may be applied to an arbitrary pair of events E and 
F. In particular, consider two events, E and F, that contain no descriptions 


principle that E equals F if and only if E is a 
Subevent of E. In symbols, 


if and onlyif Ec F and Fe Е, 


pre 


SEC. 4 EVENTS 15 


in common; for example, suppose 5 = {1, 2, 3, 4, 5, 6}, E = {1,2}, F = 
{3,4}. The union EU F= {1,2,3,4} is defined. However, what 
meaning is to be assigned to the intersection EF? To meet this need, we 
introduce the notion of the impossible event, denoted by 0. The impossible 
event Q is defined as the event that contains no descriptions and therefore 
cannot occur. In set theory the impossible event is called the empty set. 
One important property of the impossible event is that it is the complement 
of the certain event S; clearly S° = 0, for it is impossible for S not to 
Occur. A second important property of the impossible event is that it is 
equal to the intersection of any event £ and its complement £*; clearly, 
EE" = 0, for it is impossible for both an event and its complement to occur 
simultaneously. 

Any two events, E and F, that cannot occur simultaneously, so that 
their intersection EF is the impossible event, are said to be mutually 
exclusive (or disjoint). Thus, two events, E'and F, are mutually exclusive 
if and only if EF = 0. 

Two mutually exclusive events may be represented on a Venn diagram 
by the interiors of two geometrical figures that do not overlap, as in 
Fig. 4D. The impossible event may be represented by the shaded area on a 
Venn diagram, in which there is no shading, as in Fig. 4D. 

Events may be defined verbally, and it is important to be able to express 
them in terms of the event operations. For example, let us consider two 
events, / and F. The event that exactly one of the events, Е and F, will 
occur is equal to EF* U ЕЕ; the event that exactly none of the events, 
E and F, will occur is equal to Е. The event that at least one (that is, one 
or more) of the events, £ or F, will occur is equal to E U F. The event that 
at most one (that is, one or less) of the events will occur is equal to (EF) = 
Eu. PS 

The operations of event union and event intersection have many of the 
algebraic properties of ordinary addition and multiplication of numbers 
(although they are conceptually quite distinct from the latter operations). 
Among the important algebraic properties of the operations E U F and 
EF are the following relations, which hold for any events Е, F, and G: 


Cummutative law EUF FUE EF = FE 
Associative law EVU(FUG)=(EUF)UG E(FG) — (EF)G 

Distributive law E(F UG) = EF U EG E U (FG) —(E U FXE U б) 
Idempotency law EWE =E EE=E 


Because the operations of union and intersection are commutative and 
associative, there is no.difficulty in defining the union and intersection of 
an arbitrary number of events, E, Fy,-.-,£,,.... The union, written 
E,UE,U...E, V... i defined as the event consisting of all descrip- 
tions that belong to at least one of the events. The intersection, written 


16 FOUNDATIONS OF PROBABILITY THEORY cH. 1 


ЖҮ зт e Ens eoa 18 defined as the event consisting of all descriptions that 
belong to all the events. 

An unusual property of the event operations, which is used very fre- 
quently, is given by de Morgan's laws, which state, for any two events, E 
and Р, 

(4.2) (EUFy-EF (ЕЕ) = ЕОР, 
and for n events, E,, Е, ..., E, 
(4.3) (Ej UE, Ness JE) Eke SEP 


n? 


(EE E, = Ef U Ef 0-0 Et. 


An intuitive justification for (4.2) and (4.3) may be obtained by considering 
Venn diagrams. 

In section 5 we require the following formulas for the equality of certain 
events. Let E and F be two events defined on the same sample description 
space S. Then 


(4.4) Еф = 0, EUO-E. 
(4.5) F= FE U ЕЕ“, ЕЧ Е = Б ЕЕ = ЕЗ БРЕ 
(4.6) Ес Е implies EF = F ЕМЕ = Е. 


In order to verify these identities, опе must establish іп each case that the 
left-hand side of the identity is a subevent of the right-hand side and that 
the right-hand side is a subevent of the left-hand side. 


EXERCISES 


4.1. An experiment consists of drawing 3 radio tubes from a lot and testing them 
for some characteristic of interest. If a tube is defective, assign the letter D 
toit. Ifa tube is good, assign the letter G to it. A drawing is then described 
by a 3-tuple, each of whose components is either D or G. For example, 
(D, G, G) denotes the outcome that the first tube drawn was defective and 
the remaining 2 were good. Let A, denote the event that the first tube drawn 
was defective, A, denote the event that the second tube drawn was defective, 
and A, denote the event that the third tube drawn was defective. Write 
down the sample description space of the experiment and list all sample 
descriptions in the events 41, As, Аз, A, U Ao, Ay U Аз, А U Аз, 
A, V Ag U As, АА, АА, А4, Ay AoA. 


4.2. For each of the following 16 events draw a Venn diagram similar to Figure 
4A or 4B and on it shade the area corresponding to the event. Only 7 
diagrams will be required to illustrate the 16 events, since some of the cvents 
described are equivalent. (i) AB", (ii) ABS U АВ, (iii) (A U By, (iv) А°В°, 
(у) (AB)’, (vi) A* U Ве, (vii) the event that exactly 0 of the events, А and B, 


SEC. 5 PROBABILITY AS A FUNCTION OF EVENTS 17 


occurs, (viii) the event that exactly 1 of the events, А and B, occurs, (ix) the 
event that exactly 2 of the events, А and В, occur, (x) the event that at least 
0 of the events А and B, occurs, (xi) the event that at least 1 of the events, 
А and B, occurs, (xii) the event that at least 2 of the events, А and B, occur, 
(xiii) the event that no more than 0 of the events, А and B, occurs, (xiv) the 
event that no more than 1 of the events, А and B, occurs, (xv) the event that 
no more than 2 of the events, А and B, occur, (xvi) the event that А occurs 
and B does not occur. Remark: By "at least 1" we mean “1 or more," by 
“по more than 1" we mean “1 or less," and so on. 

4.3. Let S = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), А = (1, 2, 3, 4, 5, 6}, and 
В = (4, 5, 6, 7, 8,9). For each of the events described in exercise 4.2, write 
out the numbers that are members of the event. 

4.4. For each of the following 12 events draw a Venn diagram and on it shade 
the area corresponding to the event: the event that of the events А, B, C, 
there occur (i) exactly 0, (ii) exactly 1, (iii) exactly 2, (iv) exactly 3, (v) at least 
0, (vi) at least 1, (vii) at least 2, (viii) at least 3, (ix) no more than 0, (x) no 
more than 1, (xi) no more than 2, (xii) no more than 3. 

4.5. Let S, A, В be as in exercise 4.3, and let C = 47, 8, 9}. For each of the 
events described in exercise 4.4, write out the numbers that are members of 
the event. 

4.6. Prove (4.4). Note that (4.4) states that the impossible event behaves under 


the operations of intersection and union in a manner similar to the way in 
which the number 0 behaves under the operations of multiplication and 


addition. 
4.7. Prove (4.5). Show further that the events Fand EF* are mutually exclusive. 


5. THE DEFINITION OF PROBABILITY AS A FUNCTION 
OF EVENTS ON A SAMPLE DESCRIPTION SPACE 


The mathematical notions are now at hand with which one may state the 
postulates of a mathematical model of a random phenomenon. Let us 
recall that in our heuristic discussion of the notion ofa random phenomenon 
in section 1 we accepted the so-called "frequency" interpretation of 
probability, according to which the probability of an event Е is a number 
(which we denote by P[E]). This number can be known to us only by 
of a very long series of observations of independent 
(By a trial of E is meant an occurrence of the 
h E is defined.) Having observed a long series of 
f E represents the fraction of trials whose outcome 
has a description that is a member of £. In view of the frequency inter- 
pretation of P[E], it follows that a mathematical definition of the probability 
of an event cannot tell us the value of P[E] for any particular event E. 
Rather a mathematical theory of probability must be concerned with the 


experience as the result 
trials of the event £. 
phenomenon on whic 
trials, the probability o 


18 FOUNDATIONS OF PROBABILITY THEORY CH. 1 


properties of the probability of an event considered as a function defined 
onallevents. With these considerations in mind, we now give the following 
definition of probability. 

The definition of probability as a function of events on the subsets of a 
sample description space of a random phenomenon. 

Given a random situation, which is described by a sample description 
space S, probability is a function* P[] that to every event E assigns а 
nonnegative real number, denoted by Р[Е] and called the probability of the 
event E. The probability function must satisfy three axioms: 


Axiom 1. P[E] > 0 for every event E, 
AXIOM 2. P[S] = 1 for the certain event S, 


AXIOM 3. P[E U F] = P[E] + P[F], if EF = Q, or in words, the proba- 


bility of the union of two mutually exclusive events is the sum of their 
probabilities. 


It should be clear that the properties stated by the foregoing axioms do 
constitute a formal statement of some of the properties of the numbers 
P[E] and P[F], interpreted to represent the relative frequency of occurrence 
of the events E and Fina large number N of occurrences of the random 
phenomenon on which they are defined. For any event, E, let Nj; be the 
number of occurrences of E in the N occurrences of the phenomenon. 
Then, by the frequency interpretation of probability, P[E] = NyJN. 
Clearly, P[E] 2 0. Next, Ns — N, since, by the construction of S, it 
Occurs on every occurrence of the random phenomenon. Therefore, 
Р[5] = 1. Finally, for two mutually exclusive events, E and F, Nagory = 
Ny + №. Thus axiom 3 is satisfied. 

It therefore follows that any property of probabilities that can be shown 
to be logical consequences of axioms 1 to 3 will hold for probabilities 
interpreted as relative frequencies. We shall see that for many purposes 
axioms 1 to 3 constitute a sufficient basis from which to derive the pro- 
perties of probabilities. In advanced studies of probability theory, in 
which more delicate questions concerning probability are investigated, it 
is found necessary to strengthen the axioms somewhat. At the end of this 
Section we indicate briefly the two most important modifications required. 
| We now show how one can derive from axioms 1 to 3 some of the 
important properties that probability possesses. In particular, we show 
how axiom 3 suffices to enable us to compute the probabilities of events 
constructed by means of complementations and unions of other events in 
terms of the probabilities of these other events. 


* iti H z H : 
Definition: A function is a rule that assigns a real number to each element of a set 


of objects (called the domain of the function). Here the domain of the probability 
function P[-] is the set of all events on S. 


SEC. 5 PROBABILITY AS A FUNCTION OF EVENTS 19 


In order to be able to state briefly the hypotheses of the theorems 
subsequently proved, we need some terminology. It is to be emphasized 
that one can speak of the probability of an event only if the event is a 
subset of a definite sample description space S, on whose subsets a prob- 
ability function has been defined. Consequently, the hypothesis of a 
theorem concerning events should begin, “Let S be a sample description 
space on the subsets of which a probability function P[-] has been defined. 
Let E and F be any two events on S.” For the sake of brevity, we write 
instead “Let E and F be any two events on a probability space"; by a 
probability space we mean a sample description space on which a proba- 
bility function (satisfying axioms 1, 2, and 3) has been defined. 


FORMULA FOR THE PROBABILITY OF THE IMPOSSIBLE EVENT 0. 
(5.1) P[0] = 0. 


Proof: By (4.4) it follows that the certain event S and the impossible 
event are mutually exclusive; further, their union 5 U 0 = S. Con- 
sequently, P[S] = P[S U 0] = P[S] + P[0], from which it follows that 
Р[0] = 0. 


FORMULA FOR THE PROBABILITY OF A DIFFERENCE FE* OF TWO EVENTS 
E AND F. For any two events, E and F, ona probability space 


(5.2) P[FE*] = P[F] — P[EF]. 


Proof: The events FE and FE* are mutually exclusive, and their union 
is F [compare (4.5). Then, by axiom 3, P[F] = P[EF] + P[FE*], from 
which (5.2) follows immediately. 


FORMULA FOR THE PROBABILITY OF THE COMPLEMENT OF AN EVENT. For 
any event Е on a probability space 


(5.3) Р[Е] = 1 — P[E]. 


Proof: Let F = S in (5.2). Since SE* = E*, SE = E, and P[S] = 1, we 
have obtained (5.3). 


FORMULA FOR THE PROBABILITY OF A UNION E U FOF TWO EVENTS E AND 
F. For any two events, E and F, on a probability space 


(5.4) Р[Е U F] = P[E] + P[F] — P[EF]. 


Proof: We use the fact that the event E U F may be written as the union 
of the two mutually exclusive events, E and FE“, Then, by axiom 3, 
P[E U F] = P[E] + P[FE-]. By evaluating P[FE*] by (5.2), one obtains 
(5.4). 


20 FOUNDATIONS OF PROBABILITY THEORY cH. 1 


Note that (5.4) extends axiom 3 to the case in which the events whose 
union is being formed are not necessarily mutually exclusive. 


” We next obtain а basic property of the probability function, namely, 
that if an event F is a subevent of another event E, then the probability 
that F will occur is less than or equal to the probability that E will occur. 


INEQUALITY FOR THE PROBABILITY OF A SUBEVENT. Let E and F be events 


on a probability space S such that F c E (that is, F is a subevent of E). 
Then 


(5.5) P[EF]— P[E] - РЕ] if Fc E, 
(5.6) P[F]- Р[Е)  ifFcE 


Proof: By (5.2), P[E] — P[EF] = P[EF*]. Now, since F C E, it follows 
that, as in (4.6), EF = Р. Therefore, PLE] — P[F] = Р[ЕЕ], which proves 
(5.5). Next, P[EF*] > 0, by axiom 1. Therefore, P[E] — P[F] > 0, from 
which it follows that P[F] < P[E], which proves (5.6). 


From the preceding inequality we may derive the basic fact that proba- 
bilities are numbers between 0 and Is 


(5.7) for any event E Oc PIE] = 1. 


This is proved as follows. By axiom 1, 0 < P[E]. Next, any event E is a 
subevent of the certain event. Therefore, by (5.6), P[E] — P[S]. However, 
by axiom 2, P[S] — 1, and the proof of the assertion is completed. 


FORMULA FOR THE PROBABILITY OF T. 
MUTUALLY EXCLUSIVE EVEN 
of the union of n mutually 
sum of the probabilities of 


HE UNION OF A FINITE NUMBER OF 
TS. For any positive integer n the probability 
exclusive events E,, Ey,... , E, is equal to the 
the events; in symbols, 

69 PEUEZU--UEI-PE]4 PE] 4... 4. PLE, |, 
if, for every two integers i and j which are no 


t equal and which are between 
1 and z, inclusive, ЕЕ, = 0. 


T апу set of 1 mutually exclusive 
That p(1) is true is obvious, since in the 
саѕе that n = 1 (5.8) states that PIEJ = prr Next, let л be a definite 


p(n) is true. Let us show that from the 


SEC. 5 PROBABILITY AS A FUNCTION OF EVENTS 21 


assumption that р(л) is true it follows that p(n + 1)istrue. Let E, E, ..., 
E, E, be п + 1 mutually exclusive events. Since the events Eu 
Ej; U... U E, and E,., are then mutually exclusive, it follows, by 
axiom 3, that 


(59 РЕ U By U +++ U E] = PIE U Ey Us UE] + PLE, 4] 


From (5.9), and the assumption that p(n) is true, it follows that PR U 
У Enl = PIE] +... + P[E,4] We have thus shown that p(n) 
implies p(n + 1). By the principle of mathematical induction, it holds 
that the proposition p(n) applies to any positive integer л. The proof of 
(5.8) is now complete. 


The foregoing axioms are completely adequate for the study of random 
phenomena whose sample description spaces are finite. For the study of 
infinite sample description spaces, however, it is necessary to modify 
axiom 3. We may wish to consider an infinite sequence of mutually 
exclusive events, Ej, Ej, ..., E,,.... That the probability of the union 
of an infinite number of mutually exclusive events is equal to the sum of 
the probabilities of the events cannot be proved by axiom 3 but must be 
postulated separately. Consequently, in advanced studies of probability 
theory, instead of axiom 3, the following axiom is adopted. 


AXIOM 3'. For any infinite sequence of mutually exclusive events, 
JO Ёз,» „Жы MR 


(510 РЕ UE, U-- UE, U] 
= Р[Е,] + PIE] +: + P[E,] +---. 


A somewhat more esoteric modification in the foregoing axioms 
becomes necessary when we consider a random phenomenon whose 
sample description space S is noncountably infinite. It may then turn out 
that there are subsets of 5 that are nonprobabilizable, in the sense that it 
is not possible to assign a probability to these sets in a manner consistent 
with the axioms. If such is the case, then only probabilizable subsets of S 
are defined as events. Since it may be proved that the union, intersection, 
and complements of events are events, this restriction of the notion of 
events causes no difficulty in application and renders the mathematical 
theory rigorous. 


EXERCISES 


5.1. Boole's inequality. For a finite set of events, eae 
(5.11) PLA, Y A U+ U An] < PDA] + PLA] + +++ + Р[А„]. 
Prove this assertion by means of the principle of mathematical induction. 


22 
5.2. 


53. 


5.4. 


5.5. 


5.6. 


5.7. 


5.8. 


5.9. 


5.10. 


FOUNDATIONS OF PROBABILITY THEORY CH. Í 


Formula for the probability that exactly 1 of 2 events will occur. Show that 
for any 2 events, A and B, on a probability space 


(5.12) P[AB* U ВА] = Р[А] + РІВ] es 2P[AB]. 


The event АВ° U ВА is the event that exactly 1 of the events, А and B, 


will occur. Contrast (5.12) with (5.4), which could be called the formula 
for the probability that at least 1 of 2 events will occur. 


Show that for any 3 events, А, B, and C, defined on a probability space, 
the probability of the event that at least 1 of the events will occur is given by 


PIA U B U C] = P[A] + РІВ] + РІС] — P[4B] — P[AC] 
— P[BC] + P[ABC]. 
Let А and B be 2 events on a probability space. Show that 
P[AB] < Р[А] < Р[А о B] < P[A] + P[B]. 


Let А and B be 2 events ona probability space. In terms of P[A], P[B], and 


Р[АВ), express (i) for k = 0, 1,2, Plexactly k of the events, А and В, occur], 
(ii) for k — 0, 1, 2, P[at least k of the events, A and B, occur], (iii) for 


k = 0, 1, 2, Plat most k of the events, А and B, occur], (iv) PLA occurs and 
B does not occur]. 


Let А, B, and C be 3 events on a probability space. In terms of P[A], P[B], 
PIC], P[AB], PIAC], P[BC], and P[ABC] express for k — 0, 1, 2, 3 (i) 
Plexactly k of the events, А, B, C, occur], (ii) P[at least & of the events, 
А, B, C, occur], (iii) P[at most k of the events, А, B, C, occur]. 


Evaluate the probabilities asked for in exercise 5.5 in the case that 
(0 Pld] = Р[В] = 1, P[AB] =}, (ii) PLA] = РІВ] = à, P[AB] = %› 
iii) PL4] = Р[В] = 1. P[AB] = 0. 


Evaluate the probabilities asked for in exercise 5.6 in the case that 
(Q PLI = РІВ] = РІС] = 1, PLAB] = PLAC] = PLBC] = 3, PLABC] = 2, 
Gi) РА] = P[B] = pic] = 3 P[AB] = P[AC] = P[BC] = P[ABC] = 0. 


Suppose that a Study of 900 college graduates 25 
revealed that 300 were “successes,” 


college, and 100 were both “successe: 
Find, for k = 0, 1, 2, the number o 
of these two things: (i) exactly k, (i 


years after graduation 
300 had studied probability theory in 
s” and students of probability theory. 
f persons in the group who had done 
i) at least К, (iii) at most k. 


ht battle in a small war 270 fough h 
90 lost an eye, 90 lost an arm, and 90 : blog Bion eve aud 
an arm, 30 lost both an arm ide mal at Tuc oh toan 


10 lost all three, F 


ind, fot k = 0, 1. 5 nd an eye; 
of these injuries: 


| 3, L, 2, 3, the number of men who suffered 
(i) exactly k, (ii) at least K, (iii) no more than k. 


SEC. 6 FINITE SAMPLE DESCRIPTION SPACES 23 


5.11. Certain data obtained from a study of a group of 1000 subscribers to a 
certain magazine relating to their sex, marital status, and education were 
reported as follows: 312 males, 470 married, 525 college graduates, 42 
male college graduates, 147 married college graduates, 86 married males, 
and 25 married male college graduates. Show that the numbers reported 
in the various groups are not consistent. 


6. FINITE SAMPLE DESCRIPTION SPACES 


То gain some insight into the amount of freedom we have in defining 
probability functions, it is useful to consider finite sample description 
spaces. The sample description space 5 of a random observation or 
experiment is defined as finite if it is of finite size, which is to say that the 
random observation or experiment under consideration possesses only a 
finite number of possible outcomes. 

Consider now a finite sample description space S, of size N. We may 
then list the descriptions in S. If we denote the descriptions in S by Dj, 
Dy... , Dy, then we may write S = (D, Da... Dy}. For example, 
let S be the sample description space of the random experiment of tossing 
two coins; if we define D, — (H, H), D; — (H, T), Р» = (T, Н), Р, = 
(7, T), then S = {D,, Dy, Ds, Dj). 

It is shown in section 1 of Chapter 2 that 2 possible events may be 
defined on a sample description space of finite size N. For example, if 
S = (Di, Ds, Ds, D4}, then there are sixteen possible events that may be 
defined; namely, S, 0, (Di), {Do}, {Da}, {Dy}, {Di Do}, {D,, Ds}, {D,, Dy}, 
{Də D3}, {Do, Dy}, {Ds, Dy}, UD, Ds, Ds}, {D,, Dy, Ds}, {D,, Ds, Dj}, 
{Da; Ds, Ру). 

Consequently, to define a probability function P[-] on the subsets of S, 
one needs to specify the 2* values that P[A] assumes as А varies over the 
events on S. However, the values of the probability function cannot be 
specified arbitrarily but must be such that axioms 1 to 3 are satisfied. 

There are certain events of particularly single structure, called the 
single-member events, on which it will suffice to specify the probability 
function PĮ] in order that it be specified for all events. А single-member 
event is an event that contains exactly one description. If an event E has as 
its only member the description D,, this fact may be expressed in symbols 
by writing E = {D;}. Thus (Dj) is the event that occurs if and only if the 
random situation being observed has description D,. The reader should 
note the distinction between D, and {D,}; the former is a description, the 
latter is an event (which because of its simple structure is called a single- 
member event). 


24 FOUNDATIONS OF PROBABILITY THEORY cH. I 


p> Example 6A. The distinction between a single-member event and a 
sample description. Suppose that we are drawing a ball from an urn 
containing six balls, numbered 1 to 6 (or, alternately, we may be observing 
the outcome of the toss of a die, bearing numbers | to 6 on its sides). As 
sample description space 5, we take S = {1, 2, 3, 4, 5,6). The event, 
denoted by (2j, that the outcome of the experiment is a 2 is a single-member 
event. The event, denoted by (2, 4, 6}, that the outcome of the experiment 
is an even number is not a single-member event. Note that 2 is a descrip- 
tion, whereas {2} is an event. < 


A probability function P[-] defined оп 5 can be specified by giving its 
value P[{D,}] on the single-member events {D,} which correspond to the 


members of S. Its value P[E] on any event E may then be computed by the 
following formula: 


FORMULA FOR CALCULATING THE PROBABILITIES OF EVENTS WHEN THE 
SAMPLE DESCRIPTION SPACE IS FINITE. Let Е be any event on a finite sample 
description space 5 = (Di, D, ..., Dy}. Then the probability P[E] of 
the event £ is the sum, over all descriptions D, that are members of £, of 


the probabilities P[{D;}]; we express this symbolically by writing that if 
E= (D,, D... , Dy} then 


(6.1) PLE] = РЕР, + РР, + +++ + PKD} 


To prove (6.1), one need note only that if E consists of the descriptions 
Dio Div D, then Е. сап be written as the union of the mutually 
exclusive single-member events (D, }, ( D, },-..+,{D,}. Equation (6.1) 
follows immediately from (5.8). j ; f 


Ь> Example 6B. Illustrating the use of (6.1). Suppose one is drawing a 
sample of size 2 from an urn containing white and red balls. Suppose that 
as the sample description space of the experiment one takes 5 = 
TOV, W), (W, R), (R, W), (R, R)}. To specify a probability function P[] on 


pre may specify the values of P[-] on the single-member events by a 
able: 


К | av, "| (W, R) | (R, W) | (R, R) 
РЕЛ de | is | it | is 


Let E be the event that the ball drawn on 
E may be represented as a set of d 
Then, by (6.1), P[E] = 


the first draw is white. The event 
escriptions by E = {(W, W), (W, R)). 
PLOW, W)j] + PKO, RX] = 


2 
3. 


SEC. 7 { EQUALLY LIKELY DESCRIPTIONS 25 


7. FINITE SAMPLE DESCRIPTION SPACES WITH EQUALLY 
LIKELY DESCRIPTIONS 


In many probability situations in which finite sample description spaces 
arise it may be assumed that all descriptions are equally likely; that is, all 
descriptions in S have equal probability of occurring. More precisely, we 
define гле sample description space S = (Dy, Do, . . . , Dy) as having equally 
likely descriptions if all the single-member events on S have equal probabilities, 
so that 


(7.1) Рр) = Р = ++ = РИ) = 2. 


It should be clear that each of the single-member events {D;} has proba- 
bility (1/N), since there are N such events, each of which has equal 
probability, and the sum of their probabilities must equal 1, the probability 
of the certain event. 

The computation of the probability of an event, defined on a sample 
description space with equally likely descriptions, can be reduced to the 
computation of the size of the event. By (6.1), the probability of E is 
equal to (1/N), multiplied by the number of descriptions in E. In other 
words, the probability of E is equal to the ratio of the size of E to the size 
of S. 16, for a set E of finite size, we let N[E] denote the size of E (the number 
of members of E), then the foregoing conclusions can be summed up in a 
basic formula: 

FORMULA FOR CALCULATING THE PROBABILITIES OF EVENTS WHEN THE 
SAMPLE DESCRIPTION SPACE S IS FINITE AND ALL DESCRIPTIONS ARE EQUALLY 
LIKELY: For any event E on S 

N[E] _ size of E 
aa i a N[S] _ size of S` 

This formula can be stated in words. If an event is defined as a subset 
of a finite sample description space, whose descriptions are all equally 
likely, then the probability of the event is the ratio of the number of 
descriptions belonging to it to the total number of descriptions. This 
statement may be regarded as a precise formulation of the classical 
“equal-likelihood” definition of the probability of an event, first explicitly 
formulated by Laplace in 1812. 

THE LAPLACEAN '"EQUAL-LIKELIHOOD" DEFINITION OF THE PROBABILITY 
OF A RANDOM EVENT. The probability of a random event is the ratio of the 
number of cases favoring it to the number of all possible cases, when 
nothing leads us to believe that one of these cases ought to occur rather 
than the others. This renders them, for us, equally possible. 


26 FOUNDATIONS OF PROBABILITY THEORY cH. 1 


In view of (7.2), one sees that in adopting the axiomatic definition of 
probability given in section 5 one does not thereby reject the Laplacean 
definition of probability. Rather, the Laplacean definition is a special 
case of the axiomatic definition, corresponding to the case in which the 
sample description space is finite and the probability distribution on the 
sample description space is a uniform one. This is an alternate way of 
saying that all descriptions are equally likely. 

We may now state a mathematical model for the experiment of drawing 
a ball from an urn containing six balls, numbered 1 to 6, of which balls 
one to four are colored white and the remaining two balls are nonwhite. 
For the sample description space S of the experiment we take 5 = 
(1,2, 3, 4, 5, 6}. The event A that the ball drawn is white is then given as a 
subset of S by А = (1,2, 3, 4. To compute the probability of А, we 
must adopt a probability function Р[-] on S. If we assume that the descrip- 
tions in S are equally Jikely, then P[-] is determined by (7.2), and P[A] = $. 
On the other hand, we may specify a different probability function Р[:], 
specified on the single-member events of S: 


bs 


РИ) = РОЛ = РЦЗ) = РФ] = 2, PES = PEG} = 1. 


Then the function PẸ] is determined by (6.1), and Р[А] = 1. 

We have thus stated two different mathematical models for the experi- 
ment of drawing a ball from an urn. Only the results of actual experiments 
can decide which of the two models is realistic. However, as we study the 
properties of various models in the course of this book, theoretical grounds 
will appear for preferring some kinds of models over others. 


> Example 7A. Find the probability that the thirteenth day of a randomly 
chosen month is a Friday. 


Solution: The sample description space of the experiment of observing 
the day of the week upon which the thirteenth day of a randomly chosen 
month will fall is clearly 5 = (Sunday, Monday, Tuesday, Wednesday, 


Thursday, Friday, Saturday}. We are seeking P[(Fridayj]. If we assume 


eid likely descriptions, then P[(Friday]] = 1. However, would one 
elieve à 


this conclusion in the face of the followin alternative mathematical 
model? To define a probability function on S, Р that our calendar has а 
period of 400 years, since every fourth year is a leap year, except for years 
such as 1700, 1800, and 1900, at which a new century begins (or an old 
century ends) but which are not multiples of 400. In 400 years there are 97 
leap years and exactly 20,871 weeks. For each of the 4800 dates between 
1600 and 2000 that is the thirteenth day of some month one may determine 
the day of the week on which it falls. For any given day x of the week let 
us define Р[{ж}] as the relative frequency of occurrence of x in the list of 


SEC. 7 EQUALLY LIKELY DESCRIPTIONS 27 


4800 days of the week which arise as the thirteenth day of some month. 
It may be shown by a direct but tedious enumeration [see American 
Mathematical Monthly, Vol. 40 (1933), p. 607] that 


(7.3) 

x Sunday Monday Tuesday Wednesday Thursday Friday Saturday 
Р] 687 685 685 687 684 688 684 

| 4800 4800 4800 4800 4800 4800 4800 


Note that the probability model given by (7.3) leads to the conclusion that 
the thirteenth of the month is more likely to be a Friday than any other 
day of the week! «4 


p» Example 7B. Consider a state (such as Illinois) in which the license 
plates of automobiles are numbered serially, beginning with l. Assuming 
that there are 3,000,000 automobiles registered in the state, what is the 
probability that the first digit on the license plate of an automobile 
selected at random will be the digit 1? 

Solution: As the first digit on the license of a car, one may observe any 
integer in the set (1,2, 3, 4, 5, 6, 7, 8, 9). Consequently, one may be 
tempted to adopt this set as the sample description space. If one assumes 
that all sample descriptions in this space are equally likely, then one would 
arrive at the conclusion that the probability is ẹ that the digit 1 will be the 
first digit on the license plate of an automobile randomly selected from 
the automobiles registered in Illinois. However, would one believe this 
conclusion in the face of the following alternative model? As a result of 
observing the number on a license plate, one may observe any number in 
the set S consisting of all integers 1 to 3,000,000. The event А that one 
Observes a license plate whose first digit is 1 consists of the integers 
enumerated in Table 7A. The set A has size N[A] = 1,111,111. If the 


TABLE 7A 
LICENSE PLATES WITH First DIGIT 1 
All License Plates in the Following Number of Integers 


Intervals Have First Digit 1 in this Interval 
1 1 
10-19 10 
100-199 100 
1000-1999 1000 
10,000-19,999 10,000 
100,000-199,999 100,000 


1,000,000-1,999,999 1,000,000 


28 FOUNDATIONS OF PROBABILITY THEORY cH. 1 


set S is adopted as the sample description space and all descriptions in S 
are assumed to be equally likely, then 


= —— = ———— = 0.37037. 
PIAI N[S] 3,000,000 4 


EXERCISES 


7.1. Suppose that a die (with faces marked 1 to 6) is loaded in such a manner 
that, fork = 1,..., 6, the probability of the face marked k turning up 
when the die is tossed is proportional to k. Find the probability of the event 
that the outcome of a toss of the die will be an even number. 


What is the probability that the thirteenth of the month will be (i) a Friday 
or a Saturday, (ii) a Saturday, Sunday, or Monday? 


7.2 


7.3. Let a number be chosen from the integers 1 to 100 in such a way that each of 
these numbers is equally likely to be chosen. What is the probability that 
the number chosen will be (i) a multiple of 7, (ii) a multiple of 14? 


7.4. Consider a state in which the license plates of automobiles are numbered 


serially, beginning with 1. What is the probability that the first digit on the 
license plate of an automobile selected at random will be the digit 1, 
assuming that the number of automobiles registered in the state is equal to 
(1) 999,999, (ii) 1,000,000, (iii) 1,500,000, (iv) 2,000,000, (v) 6,000,000? 


What is the probability that a ball, drawn from an urn containing 3 red 


balls, 4 white balls, and 5 blue balls, will be white? State carefully any 
assumptions that you make. 


7.5. 


7.6. 


Aresearch problem. Using the same assumptions as those with which the 
table in (7.3) was derived, find the probability that Christmas (December 25) 
isa Monday. Indeed, show that the probability that Christmas will fall on a 
given day of the week is supplied by the following table: 


v Sunday Monday Tuesday Wednesday Thursday Friday Saturday 


pie) 28 56 5 57 57 58 56 
400 400 400 400 400 400 400 


8. NOTES ON THE LITERATURE OF PROBABILITY THEORY 


The first book on probability theory, De Ratiociniis in Ludo Aleae, a 
treatise on problems of games of chance, was published by Huyghens in 
1657. There were no published writings on this subject before 1657, 


although evidence exists that a number of fifteenth- and sixteenth- 
Italian mathematicians worked out the so 


problems concerning games of chance. 


century 
lutions to various probability 
General methods of attack on 


r 


sEC. 8 THE LITERATURE OF PROBABILITY THEORY 29 


such problems seem first to have been given by Pascal and Fermat ina 
celebrated correspondence, beginning in 1654. It is a fascinating cultural 
puzzle that the calculus of probability did not emerge until the seventeenth 
century, although random phenomena, such as those arising in games of 
chance, have always been present in man’s environment. For some en- 
lightening remarks on this puzzle see M. G. Kendall, Biometrika, Vol. 43 
(1956), pp. 9-12. A complete history of the development of probability 
theory during the period 1575 to 1825 is given by I. Todhunter, A History 
of the Mathematical Theory of Probability. "from the Time of Pascal to Laplace, 
originally published in 1865 and reprinted in 1949 by Chelsea, New York. 

The work of Laplace marks a natural division in the history of proba- 
bility, since in his great treatise Théorie Analytique des Probabilités, first 
published in 1812, he summed up his own extensive work and that of his 
predecessors. Laplace also wrote а popular exposition for the educated 
general public, which is available in English translation as A Philosophical 
Essay on Probabilities (with an introduction by E. T. Bell, Dover, New York, 
1951). 

ee breadth of probability theory is today too immense for any one 
man to be able to sum it up. One can list only the main references in 
English of which the student should be aware.* The literature of probability 
theory divides into three broad categories: (i) the nature (or foundations) 
of probability, (ii) mathematical probability theory, and (iii) applied 
probability theory. 

The nature of probability theory is a subject about which competent 
men differ. There are at least two main classes of concepts that historically 
have passed under the name of “probability.” It has been suggested that 
one distinguish between these two concepts by calling one probability, 
and the other probability, (this terminology is suggested by R. Carnap, 


Logical Foundations of Probability, University of Chicago Press, 1950). 
The theory of probability, is concerned with the problem of inductive 
inference, with the nature of scientific proof, with the credibility of pro- 
positions given empirical evidence, and in general with ways of reasoning 
from empirical data to conclusions about future experiences. The theory 
of probability, is concerned with the study of repetitive events that appear 
to possess the property that their relative frequency of occurrence in a 
large number of trials has a stable limit value. Enlightening discussions 
of the theories of probability, and probability, are given, respectively, by 
Sir Harold Jeffreys, Scientific Inference, Second Edition, Cambridge 


ant contributions to probability theory have been made by men of all 
s. In this section are mentioned only books available in the English language. 
uld be aware that important works on probability theory have 
f the world. 


* Import 
nationalitie: 
However, the reader sho 
been written in all the major languages о 


30 FOUNDATIONS OF PROBABILITY THEORY cH. | 


University Press, Cambridge, 1957, and Richard von Mises, Probability, 
Statistics, and Truth, Second Edition, Macmillan, New York, 1957. The 
viewpoint of professional philosophers in regard to the nature of proba- 
bility theory is debated in “А Symposium on Probability," Philosophy 
and Phenomenological Research, Vol. 5 (1945), pp. 449-532, Vol. 6 (1946), 
pp. 11-86 and pp. 590-622. The philosophical implications of the use of 
probability theory in scientific explanation are examined from the point of 
view of the physicist in two books written for the educated layman: 
Max Born, Natural Philosophy of Cause and Chance, Oxford University 
Press, 1949, and David Bohm, Causality and Chance in Modern Physics, 
London, Routledge and Kegan Paul, 1957. 

The mathematical theory of probability may be defined as consisting 
of those writings in which the viewpoint is the axiomatic one formulated 
in this chapter. This viewpoint developed in the twentieth century at the 
hands of such great probabilists as E. Borel, Н. Steinhaus, P. Lévy, and 
A. Kolmogorov.* The first systematic presentation of probability theory 
on an axiomatic basis was made in 1933 by Kolmogorov in a monograph 
available in English translation as Foundations of the Theory of Probability, 
Chelsea, New York, 1950. Several comprehensive treatises, in which are 
summarized the development of mathematical probability theory up to, 
say, 1950, are available: J. L. Doob, Stochastic Processes, Wiley, New 
York, 1953; B. V. Gnedenko and A. N. Kolmogorov, Limit Distributions 
for Sums of Independent Random Variables (translated by К. L. Chung), 
Addison-Wesley, Cambridge, 1954; and M. Loéve, Probability Theory: 
Foundations, Random Sequences, Van Nostrand, New York, 1955. A 
number of monographs covering the developments of the last twenty years 
are in process of preparation by various authors. The reader may gain 
some idea of the scope of recent work in the mathematical theory of 
probability by consulting the section “Probability” in the monthly publica- 
tion Mathematical Reviews, which abstracts all published material on 
probability theory. 

Applied probability theory may be defined as consisting of those 
writings in which probability theory enters as a tool in a scientific or 
scholarly investigation. There are so many fields of engineering and the 
physical, natural, and social sciences to which probability theory has been 
applied that it is not possible to cite a short list of representative references. 
А number of references are given in this book in the examples in which we 


* 

А Е references, see page 259 of the excellent book by Mark Kac, entitled 
Т j^ п aa Related Topics in Physical Sciences, Interscience, New York, 1959, and 
also Pau vy, "Random Functions: General Theory with Special Reference to 


Laplacian Rand i » " a qur 
Diss. ^ Б от Functions,” University of California Publications in Statistics, Vol. 1 


SEC. 8 THE LITERATURE OF PROBABILITY THEORY 31 


discuss various applications of probability theory. Some idea of the diverse 
applications of probability theory can be gained by consulting M. S. 
Bartlett, Stochastic Processes, Cambridge University Press, 1955, or the 
book by Feller cited below. The role of probability theory in mathematical 
statistics is discussed in H. Cramér, Mathematical Methods of Statistics, 
Princeton University Press, 1946. 

The following books are classic introductions to probability theory that 
the reader can consult for alternate treatments of some of the topics 
discussed in this book: W. Feller, An Introduction to Probability Theory 
and its Applications, Second Edition, Wiley, New York, 1957; T. C. Fry, 
Probability and its Engineering Uses, Van Nostrand, New York, 1928; 
T. M. Uspensky. Introduction to Mathematical Probability, McGraw-Hill, 
New York, 1937. Feller’s inimitable book is especially recommended, 
since it is simultaneously an introductory textbook and a treatise on mathe- 


matical and applied probability theory. 


CHAPTER 2 


Basic 


Probability Theory 


Many of the basic concepts of probability theory, as well as a large 
number of important problems of applied probability theory, may be 
considered in the context of finite sample description spaces and thus can 
be studied with a minimum of mathematical technique. In Chapters 2 and 
3 only finite sample description spaces are considered. In this chapter we 


further restrict ourselves to finite sample description spaces with equally 
likely descriptions. 


1. SAMPLES AND n-TUPLES 


A basic tool for the construction of sample description spaces of random 
phenomena is provided bythe notion ofan n-tuple. Ann-tuple (2,,zs, . . . ,2,) 
is an array of n symbols, 21, 25»... 2,4, Which are called, respectively, the 
first component, the second component, and so on, up to the nth component, 
of the n-tuple. The order in which the components of an n-tuple are 
written is of importance (and consequently one sometimes speaks of 
ordered n-tuples). Two n-tuples (z, z,...,2,) and (6,25, ..., Sq’) are 
said to be identical, or indistinguishable, if and only if they consist of the 
same components written in ; Symbolically, z, = 2, for 


k=1,2,...,n derives from the fact that 


SEC. 1 SAMPLES AND 1-TUPLES 33 


А basic random phenomenon with whose analysis we are concerned in 
probability theory is that of sampling. Suppose we have an urn containing 
M balls, which are numbered 1 to M. Suppose we draw balls from the 
urn one at a time, until 7 balls have been drawn; for brevity, we say we 
have drawn a sample (or an ordered sample) of size n. Of course, we 
must also specify whether the sample has been drawn with replacement or 
without replacement. 

The drawing is said to be done with replacement, and the sample is said 
to be drawn with replacement, if after each draw the number of the ball 
drawn is recorded, but the ball itself is returned to the urn. The drawing 
is said to be done without replacement, and the sample is said to be drawn 
without replacement, if the ball drawn is not returned to the urn after each 
draw, so that the number of balls available in the urn for the kth draw is 
M — k + 1. Consequently, if the drawing is done without replacement, 
then the size л of the sample drawn must be less than or equal to M, the 
original number of balls in the urn. On the other hand, if the drawing is 
done with replacement, then п may be any nu mber. 

To report the result of drawing a sample ofsizen, an n-tuple (2, 25, . . . 2) 
is used, in which z, represents the number of the ball drawn on the first 
draw, z, represents the number of the ball drawn on the second draw, and 
so on, up to z,, which represents the number of the ball drawn on the nth 


draw. 


p» Example 1A. All possible samples of size 3 from an urn containing four 
balls. Let us consider an urn which contains four balls, numbered 1 to 4, 
and let a sample of size 3 be drawn. If the sampling is done without 
replacement, then the possible samples that can be drawn are 


(12,3, 230, G 4D, (1,2) 
(1,2,4), (2,3,4, (3,42, 41,3) 
(1,3,2), (41, (1,2), (4,2,3) 
(1,3,4), (243), (31,4, (42,1) 
(142, (21,3, (3,2,1, (530 
(1,43, (21,4, G24, (4,3,2) 
If the sampling is done with replacement, then the possible samples that 
can be drawn are 
бл QLD GLD (&LD 
(1,1,2), (21,2) QGL2. (41,2) 
(1,1,3), (1,3, (31,3), (41,3) 
(1,1,4, 0,1,4, 6, 1,4), (4,1,4) 
(62,1). G20, G 2:1) (4,2,1) 
(1,2,2), 0,2,2), G22. (42,2) 


34 BASIC PROBABILITY THEORY CH.2 


(1,2, 3), (23,2, 3) (2,3) (4, 2, 3) 
(1, 2, 4), (2, 2, 4), (3, 2, 4), (4, 2, 4) 
(1, 3, 1), (23,1), (8,3, 1), (4, 3, 1) 
(1,3, 2), (2,3,2), (3, 3, 2), (4, 3, 2) 
(1, 3, 3), (2.323), (3, 3, 3), (4, 3, 3) 
(1, 3, 4), (2, 3, 4), (3, 3, 4), (4, 3, 4) 
(1, 4, 1), (2, 4, 1), (3, 4, 1), (4, 4, 1) 
(1, 4, 2), (2, 4, 2), (3, 4, 2), (4, 4, 2) 
(1, 4, 3), (2, 4, 3), (3, 4, 3), (4, 4, 3) 
(1, 4, 4), (2, 4, 4), (3, 4, 4), (4, 4, 4) « 


As indicated in section 7 of Chapter 1, many probability problems 
defined on finite sample description spaces may be reduced to problems of 
counting. Consequently, it is useful to know the basic principles of 
combinatorial analysis by which the size of sets of n-tuples, which arise 
in various ways, may be counted. We now state a formula tliat is basic to 
the theory of counting sets of n-tuples and that may be called the basic 
principle of combinatorial analysis. : 

Suppose there is a set 4 whose members are ordered n-tuples of objects 


of some sort. In order to compute the size of A, first determine the number 
N, of objects that may be used as the first component of an n-tuple in A. 
Next determine (if it 


exists*) the number N, of objects that may be second 
components of an n-tuple, of which the first component is known. Then 
determine (if it exists) the number М» of objects that may be third com- 
ponents of an n-tuple, of which the first two components are known. 
Continue in this manner until the number N, (if it exists) of objects that 
may be the nth component of an n-tuple, of which the first (n — 1) com- 


ponents are known, has been determined. The size of the set A of n-tuples 
is then given by the product of the numbers Ny, No,..., N,; in symbols. 


(1.1) МА] = N,N; N,. 

As a first application of this basic principle, 
different kinds of objects. Suppose that we have 
of the first kind, N, objects а(®,..., а) of the second kind, and so 
on, up to N, objects a(9, . . , aY of the nth kind. We may then form 


ММ... №, ordered n-tuples (as), | panay aj?) containing one element of 
each kind. 


suppose that we have л 


\ 1 
№, objects a(P, . .. , a? 


> Example 1B. A man has five suits, three pairs of Shoes, and two hats. 
How many different combinations of attire can he wear? 

,. * The number N, exists if the number of possible second components that may occur 
in an n-tuple, of which the first component is known, does not depend on which first 
component has occurred. 


SEC. 1 SAMPLES AND #-TUPLES 35 


Solution: A combination of attire is a 3-tuple (a, a, a), in which 
a), а), а) denote, respectively, the suit, shoes, and hat worn. By the 
basic principle of combinatorial analysis there are 5: 3:2 — 30 combina- 
tions of attire. E 


We next apply the basic principle of combinatorial analysis to determine 
the number of samples of size n that can be drawn with or without replace- 
ment from an urn containing M distinguishable balls. 

The number of ways in which one can draw a sample of n balls from an 
urn containing M distinguishable balls is M(M —1):::(M—n-4 1), if 
the sampling is done without replacement, and M", if the sampling is done 
with replacement. 

To show the first of these statements, note that there are Af possible 
choices of numbers for the first ball drawn, (M — 1) choices of numbers 
for the second ball drawn, and finally M — n + 1 = M — (n — 1) choices 
of numbers for the nth ball drawn. The second statement follows by a 
similar argument, since for each of the п balls in the sample there are M 
choices. 

Various notations have been adopted to denotethe product M(M — 1)... 
(M — n + 1). We adopt the notation (M),. We thus define, for any 
positive integer M = 1, 2,..., and for any integer n = 1,21 si ay Mi 


(1.2) (M), = M(M — D: (M — n + D). 


Another notation with which the reader should be familiar is that of the 
factorial. Given any positive integer M, we define M! (read, M factorial) 


as the product of all the integers, 1 to M. Thus 
(1.3) Mi 21:2: (M — DM. 


We can write (M), in terms of the factorial notation by 


= йез 0, 12,097, M 


(1.4) (M), = (M — 5) 


In order that (1.4) may hold for n = M, we define 


(1.5) 0! = I. 
In order that (1.4) may hold for 7 — 0, we define 


(1.6) (M, = 1. 


36 BASIC PROBABILITY THEORY CH. 2 


p Example 1C. (4), = 1, (4), = 4, (4) = 12, (4); = 24, (4), = 4! = 24. 
Note that (4); is undefined at present. It is later defined as having 
value 0. <q 


An important application of the foregoing relations is to the problem 
of finding the number of subsets of a set. Consider the set S = {1,2,..., Nj, 
which consists of all integers, ] to N. How many possible subsets of S 
can be formed? In order to solve this problem, we first find for k = 1, 
2,..., N the number of subsets of S of size k that can be formed. Let 2, 
be the number of subsets of S of size k. We shall prove that x, satisfies 
the relationship x, k! = (N),, so that 


ал (№), 


2, = 
к k! 


To see this, regard each subset of S of size k as an urn (containing k 
distinguishable balls) from which samples of size k are being drawn 
without replacement; the number of samples that can be drawn in this 
manner is k!. On the other hand, the number of samples of size k, drawn 
without replacement, that can be drawn from the set S, regarded as an urn 
containing N distinguishable balls, is (N),. A little reflection will convince 
the reader that all the samples without replacement of size k that can be 
drawn from S can be obtained by first choosing a subset of S of size k 
from which one then draws all possible samples without replacement of 
size К. Consequently, z, - К! = (№), or, in words, the number of subsets of 
S of size k, multiplied by the number of samples that сап be drawn without 
replacement from a subset of size k, is equal to the number of samples of size 
k that can be drawn without replacement from S itself. 

We now introduce some notation. We define, for any integer N — 1, 


2,..., and integer k = 0, 1, ..., №, the symbol x) by 
аз) А араат N! 
k k! 1:2:--k — kXN — К)! 


Equation (1.7) may be restated as follows: the number of subsets of size k 


that may be formed from the members of a set of size N is pi 


p> Example 1D. The subsets of size 3 of a set of size 4. Consider the set 


4 
(1,2, 3,4}. There are 4 = 4 subsets of size 3 that can be formed, 


namely, (1,2, 3}, (1,2, 4), (1,3, 4}, (2,3, 4). Notice that from each of 


SEC. 1 SAMPLES AND 71-TUPLES 37 


these subsets one may draw without replacement six possible samples, 
so that there are twenty-four possible samples of size 3 to be drawn without 
replacement from an urn containing four balls. 4 


The quantities (t) are generally called binomial coefficients because of 


the role they play in the binomial theorem, which states that for any two 
real numbers a and b and any positive integer N 

2 X INA oer 
(1.9) (a4-b)- X (Ke neue 


k=0 


Мх мү xa NY х-°рз 
= (0) + (1) 5 (5 ар; 


WY хына Ў {ава 
++ (e (ев 2 


N N-1 x)" 
Ф ( Ке 1) + ( Ne". 
y N E 
It is convenient to extend the definitions of k and (N), to any positive 


or negative integer k. We define, for N — | a «253 


(1.10) (№), = (0) =1, QU (t) = 0, 


if either k < 0 or k > №. " : 
We next note the extremely useful relation, holding for N — 1.222. 


andik = 0, +1, +2,---> 


N N\ _ (NI 
(1.11) PEN T (х = k J 
rified directly from the definition of binomial 


(1.11) can be obtained. Given a 
lement t in S. The number of 


This relation may be verified dir: 
coefficients. An intuitive justification of 
set S, with N + 1 members, choose an е 


; P N 
subsets of S of size k in which ¢ 15 not present is equal to ( A whereas 
the number of subsets of S of size k in which ¢ is present is (, = jJi 


1 
the sum of these two quantities is equal to k ) the total number of 


Subsets of S of size К. 


2 
38 BASIC PROBABILITY THEORY CH. 2 


Equation (1.11) is the algebraic expression of a fact represented in 
tabular form by Pascal’s triangle: 


and so on. Equation (1.11) expresses the fact that each term in Pascal’s 

triangle is the sum of the two terms above it. y 
One also notices in Pascal's triangle that the entries on each line 

are symmetric about the middle entry (or entries). More precisely, the 


binomial coefficients have the property that for any positive integer N and 
k 20,1,2,...,N 


N N 
n Pall -(; ) 
To prove (1.12) one need note only that each side of the equation is equal 
to ЛЕТОМ — k)!. 


It should be noted that with (1.11) and the aid of the principle of 
mathematical induction one may prove the binomial theorem. 


The mathematical facts are now at hand to determine how many 


subsets of a set of size N one may form. From the binomial theorem (1.9). 
with a = b = 1, it follows that 


(лз) 14 [| + (2) фый, JR, " (") E 


From (1.13) it follows that the number of events (including the impossible 
event) that can be formed on a sample description space of size N is 2”. 


For there is one impossible event, "i events of size 1, (5) events of 


; N 
SHE 2...3 (i) events of sizek,..., ы j events of size N — 1, and 


(5) events of size N. There is an alternate way of showing that if S has 


SEC. I SAMPLES AND -TUPLES 39 


М members then it has 2% subsets. Let the members of S be numbered 1 to 
N. To describe a subset A of 5, we may write an N-tuple (4, 15... ., ty) 
whose jth component г; is equal to 1 or 0, depending on whether the jth 
member of S does or does not belong to the subset A. Since one can form 
2* N-tuples, it follows that 5 possesses 2* subsets. 

Another counting problem whose solution we shall need is that of 
finding the number of partitions of a set of size N and, in particular, of the 
set S = {1,2,..., N}. Letr bea positive integer and let Kj Ks... . . k, be 
positive integers such that k, + ky +... + k, = N. By a partition of S, 
with respect to r and Ау, Ks, .. . , K,, we mean a division of S into r subsets 
(ordered so that one may speak of a first subset, a second subset, etc.) such 
that the first subset has size Ку, the second subset has size ky, and so on, 
up to the rth subset, which has size K,. 


p» Example 1E. Partitions of a set of size 4. The possible partitions of the 
set (1, 2. 3, 4} into three subsets, the first subset of size 1, the second 
subset of size 2, and the third subset of size 1, may be listed as follows: 


(0, {2,3}, (4), (QV (L3) (4. 
(n 2.44 G5, (02), (14s 8). 
(0, £204 (2), 02}. (G4. 11р, 
(3) (12 (4) 04). (1.2. BÐ 
({3}, {1,4}, {2}, ({4}, {1,3} (2) 
(3, (24M Пу» — (4. (2,3) 01р 4 


We now prove that the number of ways in which one can partition a set 
of size N into r ordered subsets so that the first subset has size ky, the second 
subset has size ks, and so on, where Ky + Ka +... + k, = N, is the product 


“ГҮҮ ke hh mne 


To prove (1.14) we proceed as follows. For the first subset of k, items 


Р 


N я i 
there are N items available, so that there are Ie ) ways in which the subset 


ч 
of k, items can be selected. There аге № — k, items available from which 
to select the ky items that go into the second subset; consequently, the 

og Р s [N — к 
second subset, containing k, items, can be selected in ( n ) ways. 
Continuing in this manner, we determine that the rth subset, containing 
N= =o 

k li 
these expressions, we obtain the number of ways in which a set of size М 
can be partitioned in the manner described. 


E. тр 
k, items, can be selected in ( y ways. By multiplying 


я 2 
40 BASIC PROBABILITY THEORY CH. Z 
The expression (1.14) may be written in a more convenient form. It 


s N 
is clear by use of the definition of (2) that 


Р мүмкү м | 
19) MI к, ) kN — k, — Ку)! 


Next, one obtains 


N Pai um e N! | 
A i Ke ) | kg АЛЕК ЦУ ky — ky — Ку)! 


Continuing in this manner, one finds that (1.14) is equal to 


М! 
(1.16) EE RP 


Quantities of the form of (1.16) arise frequently, and a special notation is 
introduced to denote them. For any integer N, and r nonnegative integers 
ky, Ky, . . , k, whose sum is N, we define the multinomial coefficient: 


N N! 
(1.17) “м “ЫЕ 


The multinomial coefficients derive th 
the coefficients in the ex 
ata +.. 


eir name from the fact that they are 
pansion of the Nth power of the multinomial form 
+ + a, in terms of powers of 45 05...,0, 
0.8 (a +а +- а,)* 


N N N 


| 
kig K k 
Xv арада... а, 
к=0 6,=0 m am k, 
Eph tk, =N 


It should be noted that the su 


mmation in 
integers Ку, ka, . 


‚+, k, which sum to N. 


> Example 1F. Bridge hands. The number of different hands a player in 
a bridge game can obtain is 


(1.18) is over all nonnegative 


(1.19) (8) = 635, 013, 559, 600 = (6.35) 1011, 


since a bridge hand constitutes a set of thirteen cards selected from a set of 


SEC. 1 SAMPLES AND 7-TUPLES 41 


52. The number of ways in which a bridge deck may be dealt into four 
hands (labeled, as is usual, North, West, South, and East) is 


52 (39) (26) [13 32 (52)! 

1.20 ( ) = ) == ешш {@' 18. 
0:20) 13 Lis (ua s liia 13 13 (13)!4 630 W A 
The symbol = is used in this book to denote approximate equality. 

It should be noted that tables of factorials and logarithms of factorials 
are available and may be used to evaluate expressions such as those in 
(1.20). 


EXERCISES 


1.1. A restaurant menu lists 3 soups, 10 meat dishes, 5 desserts, and 3 beverages. 
In how many ways can a meal (consisting of soup, meat dish, dessert, and 


beverage) be ordered? ' 
1.2. Find the value of (i) (5)s, (ii) (5)?, (iii) 5! (iv) (3. 


1.3. How many subsets of size 3 does a set of size 5 possess? How many 
subsets does a set of size 5 possess? 

1.4. In how many ways can a bridge deck be partitioned into 4 hands, each of 
size 13? 

1.5. Five politicians meet at a party. How many handshakes are exchanged if 
each politician shakes hands with every other politician once and only once? 


1.6. Consider a college professor who every year tells exactly 3 jokes in his 
course. If it is his policy never to tell the same 3 jokes in any year that he 
has told in any other year, what is the minimum number of jokes he will 
tell in 35 years? If it is his policy never to tell the same joke twice, what is 
the minimum number of jokes he will tell in 35 years? 


1.7. Inhow many ways can a student answer an 8-question, true-false examina- 
tion if (i) he marks half the questions true and half the questions false, 
(ii) he marks no two consecutive answers the same? 


1.8. State, by inspection, the value of 
98 4-3-2 


S4 RIS IG +1, 
19. If (5) = B. find n. I (18) = (, 18 |), find r. 


140. Find the value of () (515). G (331) @ (55 4), aw (,,). 


Explain why (3 3 9) = (3) 


42 BASIC PROBABILITY THEORY сн. 2 


1.11. Evaluate the following sums: 
3 


bi 4 4 Р 3 m Р 
2 2 Ad 2 26 X IE A 
EA i-1j-2 jei jitt i-lj-icl 


i=l 


5 sd yma jays Ci form words 
. Given an alphabet of л symbols, in how many ways can one 101 
Ыы consisting af essitly syaibals? Consequently, find the number of possible 
3 letter words that can be formed in the English language. 


1.13. Find the number of 3-letter words that can be formed in the English 


language whose first and third letters are consonants and whose middle 
letter is a vowel. 


1.14. Use (1.11) and the principle of mathematical induction to prove the 
binomial theorem, which is stated by (1.9). 


2. POSING PROBABILITY PROBLEMS MATHEMATICALLY 


The principle that lies at the foundation of the mathematical theory 

of probability is the following: to speak of the probability of a random 
event А a probability space on which the event is defined must first be 
Set up. In this section we show how several problems, which arise fre- 
quently in applied probability theory, may be formulated so as to be 
mathematically well posed. The examples discussed also illustrate the use 
of combinatorial analysis to solve probability problems that are posed in 
the context of finite sample description spaces with equally likely descrip- 
tions. 
Ь Example 2A. An urn problem. Two balls 
(without replacement) from an urn containin 
white and two are red. Find the probabili 
white, (ii) both balls will be the same color, 
will be white. 

Solution: To set up a mathematical model 
assume that the balls in the urn are distingui 
that they are numbered 1 to 6. Let the w 
and let the red balls be numbered 5 and 6 

Let us first consider that the balls ar 


The sample description s 
Chapter 1; 


are drawn. with replacement 
g six balls, of which four are 
ty that (i) both balls will be 
(iii) at least one of the balls 


for the experiment described, 
shable; in particular, assume 
hite balls bear numbers | to 4, 
€ drawn without replacement. 


pace S of the experiment is then given by (3.1) of 
more compactly we write 


(2.1) 5 = (4, %): fori 1,2, % — 1,++*,6, butz, Æ 2). 
In words, one may read Q.1) as follows: 


S is the set of all 2-tuples (21, 22) 
Whose components are any numbers, 


1 to 6, subject to the restriction that 


SEC. 2 POSING PROBABILITY PROBLEMS MATHEMATICALLY 43 


no two components of a 2-tuple are equal. The jth component z; of a 
description represents the number of the ball drawn on the jth draw. Now 
let А be the event that both balls drawn are white, let В be the event that 
both balls drawn are red, and let C be the event that at least one of the 
balls drawn js white. The problem at hand can then be stated as one of 
finding (i) P[A], (ii) P[A U B], (iii) P[C]. It should be noted that C = В“, 
so that P[C] = 1 — P[B]. Further, A and B are mutually exclusive, so 
that P[A U В] = P[A] + P[B]. Now 


(2.2) А = {(1, 2), (1, 3. (1, 4), (2, 1), (2, 3), (2, 4), 
G, 1), (3. 2), (3, 4), (4, 1), (4, 2), (4, 3)) 


= {(z z2): fori-1,2,2;— 1, * ++ , 4, butz, 5 25), 


whereas B = {(5, 6), (6, 5)}. Let us assume that all descriptions іп 5 are 
equally likely. Then 


Q3) Р[А]= === 04,  P[B]- É = 0.066. 


The answers to the questions posed in example 2A are given, in the case 
of sampling without replacement, by (i) P[A] = 0.4, (ii) PLA U В] = 0.466, 
(iii) P[C] = 0.933. These probabilities have been obtained under the 
assumption that the balls in the urn may be regarded as numbered 
(distingyishable) and that all descriptions in the sample description space 
S given in (2.1) are equally likely. In the case of sampling with replacement, 
a similar analysis may be carried out; one obtains the answers 


4-4 2 12 
(2.4) РИ] = = 0444, РІВ] = FZ 011, 


P[A U В] = 0.555, Р[С] = 0.888. 


It is interesting to compare the values obtained by the foregoing model 
with values obtained by two other possible models. One might adopt 
as a sample description space 5 = {(W, W), (W, R), (R, W), (R, R)). This 
space corresponds to recording the outcome of each draw as W or R, 
depending on whether the outcome of the draw is white or red. If one 
were to assume that all descriptions in S were equally likely, then Р[А] = 4, 
Р[А U B] =}, PIC] = #. Note that the answers given by this model do 
not depend on whether the sampling is done with or without replacement. 
One arrives at a similar conclusion if one lets S = {0, 1, 2}, in which 0 
signifies that no white balls were drawn, І signifies that exactly 1 white 
ball was drawn, and 2 signifies that exactly two white balls were drawn. 


44 BASIC PROBABILITY THEORY CH. 2 


Under the assumption that all descriptions in S are equally likely, one 
would conclude that P[A] = 3, P[A \ B] = 3, P[C] = $. < 


The next example illustrates the treatment of problems concerning urns 
of arbitrary composition. It also leads to a conclusion that the reader 
may find startling if he considers the following formulation of it. Suppose 
that at a certain time the milk section of a self-service market is known 
to cofitain 150 quart bottles, of which 100 are fresh. If one assumes that 
each bottle is equally likely to be drawn, then the probability is 3 that a 
bottle drawn from the section will be fresh. However, suppose that one 
selects one bottle after each of fifty other persons have selected a bottle. 
Is one's probability of drawing a fresh bottle changed from what it would 
have been had one been the first to draw? By the reasoning employed in 
example 2B it can be shown that the probability that the fifty-first bottle 


drawn will be fresh is the same as the probability that the first bottle 
drawn will be fresh. 


> Example 2B. An urn of arbitrary composition, An urn contains M 
balls, of which Mj, are white and М are red. A sample of size 2 is 
drawn with replacement (without replacement). What is the probability 
that (i) the first ball drawn will be white, (ii) the second ball drawn will 
be white, (iii) both balls drawn will be white? 

Solution: Let A denote the event that the first ball drawn is white, 
B denote the event that the Second ball drawn is white, and C denote the 


ite. It should be noted that C = АВ. 
l to M, the white balls bearing 
ring numbers Му, + 1 to M. 

18 with replacement. The sample 
Onsists of ordered 2-tuples (21, 22), 
drawn on the first draw and 25 is 
econd draw, Clearly, N[S] = M?. 
a description is in А if and only if 
My (meaning a white ball was 


component o iption in A. 
Consequently, by (1.1), the size of A i ү, кес uai Ea. 
since there are M Possibilities for the first component and My possi- 
bilities for the second component of a description in B, The reader may 
Med eius a nt AB, (a white ball is drawn om 

» has size N[AB] = MyMy-. Thus in the case of sampling 


SEC. 2 POSING PROBABILITY PROBLEMS MATHEMATICALLY 45 


with replacement one obtains the result, if all descriptions are equally 
likely, that 
My My? 

Q.5) Pld) = PIB]- =", РАВ] = (Ж) | 

We next consider the case of sampling without replacement. The sample 
description space of the experiment again consists of ordered 2-tuples 
(21, 22), in which z, (for j = 1, 2) denotes the number of the ball drawn 
on the jth draw. As in the case of sampling with replacement, each 2; is 
a number 1 to M. However, in sampling without replacement a description 
(21, 25) must satisfy the requirement that its components are zot the same. 
Clearly, N[S] = (M), = M(M — 1). Next, N[4] = Mẹ(M — 1), since 
there are My, possibilities for the first component of a description in А 
and M — 1 possibilities for the second component of a description in 4; 
the urn from which the second ball is drawn contains only (M — 1) balls. 
To compute N[B], we first concentrate our attention on the second 
component of a description in B. Since B is the event that the ball drawn 
on the second draw is white, there are My possibilities for the second 
component of a description in B. To each of these possibilities, there are 
only M — 1 possibilities for the first component, since the ball which is 
to be drawn on the second draw is known to us and cannot be drawn on 
the first draw. Thus N[B] = (M — 1)M by (1.1). The reader may 
verify that the event AB has size N[AB] = Му (М. — 1). Consequently, 
in sampling without replacement one obtains the result, if all descriptions 
are equally likely, that 

Myy _М (Му — 1) 

(2.6) P[A] = РІВ] = ^r: Р[АВ| = мир 


Another way of computing P[B], which the reader may find more 
convincing on first acquaintance with the theory of probability, is as 
follows. Let B, denote the event that the first ball drawn is white and 
the second ball drawn is white. Let B, denote the event that the first ball 
drawn is black and the second ball drawn is white. Clearly, N[B,] — 
My(My — 1), МВД = (M — My) My. Since P[B] = P[B,] + P[B;], we 
have 
My(Mg- = 1) 4 (M — My)My _ My 

M(M — 1) M(M — 1) M ` 


To illustrate the use of (2.5) and (2.6), let us consider an urn containing 
M = 6 balls, of which M, = 4 are white. Then Р[А] = P[B] = $ and 
P[AB] = # in sampling with replacement, whereas P[A] = P[B] = $ and 
P[AB] — 2 in sampling without replacement. 


Р[В] 


46 BASIC PROBABILITY THEORY CH. 2 


The reader may find (2.6) startling. It is natural, in the case of sampling 
with replacement, in which P[A] = Р[8], that the probability of drawing 
a white ball is the same on the second draw as it is on the first draw, 
since the composition of the urn is the same in both draws. However, 
it seems very unnatural, if not unbelievable, that in sampling without 
replacement P[4] = P[B]. The following remarks may clarify the meaning 
of (2.6). 

Suppose that one desired to regard the event that a white ball is drawn 
on the second draw as an event defined on the sample description space, 
denoted by S’, which consists of all possible outcomes of the second draw. 
To begin with, one might write S’ = {1,2,..., М). However, how is a 
probability function to be defined on the subsets of S! in the case in which 
the sample is drawn without replacement. If one knows nothing about 
the outcome of the first draw, perhaps one might regard all descriptions 
in S' as being equally likely; then, P[B] = My,/M. However, suppose 
one knows that a white ball was drawn on the first draw. Then the 
descriptions in S’ are no longer equally likely; rather, it seems plausible 
to assign probability 0 to the description corresponding to the (white) 
ball, which is not available on the second draw, and assume the remaining 
descriptions to be equally likely. One then computes that the probability 
of the event B (that a white ball will be drawn on the second draw), given 
that the event A (that a white ball was drawn on the first draw) has 
occurred, is equal to (My — D/(M — 1). Thus (My — D/(M — 1) 
represents a conditional probability of the event В (and, in particular, the 
conditional probability of В, given that the event А has occurred), whereas 
My|M represents the unconditional probability of the event B. The 


distinction between unconditional and conditional probability is made 
precise in section 4. 4 


The next example we shall consider is а generalization of the celebrated 
problem of repeated birthdays. Suppose that one is present in a room in 
Which there are n people. What is the probability that no two persons in 
the room have the same birthday? Let it be assumed that each person 
in the room can have as his birthday any one of the 365 days in the year 
(ignoring the existence of leap years) and that each day of the year is 
equally likely to be the person's birthday. Then selecting a birthday for 
each person is the same as selecting а number randomly from an urn 
containing M = 365 balls, numbered 1 to 365. It is shown in example 2C 


that the probability that no two persons in a room containing n persons 
will have the same birthday is given by 


(365), 1 2 
27 = – а) dun 
ет) (3657 ( ESI sss) ( ~ ). 


SEC. 2 POSING PROBABILITY PROBLEMS MATHEMATICALLY 47 


The value of (2.7) for various values of n appears in Table 2A. 


TABLE 2A 


In a room containing n persons let P, be the probability 
that there are not two or more persons in the room with the 
same birthday and let Q, be the probability that there are 
two or more persons with the same birthday. 


n Pa Qn 
4 0.984 0.016 
8 0.926 0.074 

12 0.833 0.167 

16 0.716 0.284 
20 0.589 0.411 
22 0.524 0.476 
23 0.493 0.507 
24 0.462 0.538 
28 0.346 0.654 
32 0.247 0.753 
40 0.109 0.891 
48 0.039 0.961 
56 0.012 0.988 


64 0.003 0.997 


From Table 2A one determines a fact that many students find startling 
and completely contrary to intuition. How many people must there be 
in a room in order for the probability to be greater than 0.5 that at least 
two of them will have the same birthday? Students who have been asked 
this question have given answers as high as 100, 150, 365, and 730. In 


fact, the answer is 23! 

Ь Example 2C. The probability of a repetition in a sample drawn with 
replacement. Let a sample of size л be drawn with replacement from an 
urn containing M balls, numbered 1 to M. Let P denote the probability 
that there are no repetitions in the sample (that is, that all the numbers 
in the sample occur just once). Let us show that 


Q8 P = = ( Е ta ( = = a (1 = tx. 


The sample description space S of the experiment of drawing with 


48 BASIC PROBABILITY THEORY сн. 2 


replacement a sample of size п from an urn containing M balls, numbered 
1 to M, is 


(2.9) S= (4, 25,°"*, 21): fori=1,+-+,n,2;= 1,-+*, M). 


The jth component z; of a description represents the number of the ball 
drawn on the jth draw. The event A that there are no repetitions in the 
sample is the set of all n-tuples in S, none of whose components are equal. 
The size of A is given by N[A] = (M),, since for any description in A 
there are M possibilities for its first component, (M — 1) possibilities for 
its second component, and so on. The size of S is N[S] = M". If we 
assume that all descriptions in 5 are equally likely, then (2.8) follows. <q 


> Example 2D. Repeated random digits. Another application of (2.8) is 
to the problem of repeated random digits. Consider the following experi- 
ment. Take any telephone directory and open it to any page. Choose 
100 telephone numbers from the page. Count the numbers whose last 
four digits are all different. If it is assumed that each of the last four 
digits is chosen (independently) from the numbers 0 to 9 with equal 
probability, then the probability that the last four digits of a randomly 
chosen telephone number will be different is given by (2.8), with n = 4 
and M — 10. The probability is (10),/10* — 0.504. 4 


The next example is concerned with a celebrated problem, which we call 
here the problem of matches. Suppose you are one of M persons, cach 
of whom has put his hat in a box. Each person then chooses a hat 
randomly from the box. What is the probability that you will choose 
your own hat? It seems reasonable that the probability of choosing one's 
own hat should be 1/M, since one could have chosen any one of M hats. 
However, one might prefer to adopt a more detailed model that takes 
account of the fact that other persons may already have selected hats. 
A suitable mathematical model is given in example 2E. In section 6 the 
model given in example 2E is used to find the probability that at least one 
person will choose his own hat. But whether the number of hats involved 
is 8, 80, or 8,000,000, the rather startling result obtained is that the 
probability is approximately equal to e~ = 0.368 that no man will choose 
his own hat and approximately equal to 1 — e-1 = 0.632 that at least one 
man will choose his own hat. 
p> Example 2E. Matches (rencontres). Suppose that we have M urns, 
numbered 1 to M, and M balls, numbered l to M. Let one ball be inserted 
in each urn. If a ball is put into the urn bearing the same number as 
the ball, a match is said to have occurred. In section 6 formulas are 
given (for each integer n = 0), 1,... ; M) for the probability that exactly 


SEC. 2 POSING PROBABILITY PROBLEMS MATHEMATICALLY 49 


for k = 1,2,..., M the probability of the event 4, that a match will 
occur in the Ath urn. The probability P[4;] corresponds, in the case of 
the M persons selecting their hats randomly from a box, to the probability 
that the kth person will select his own hat. 

To write the sample description space S of the experiment of distributing 
M balls in M urns, let z; represent the number of the ball inserted in the 
jth urn (for j= 1,..., M). Then S is the set of n-tuples (z,, 25, . . . , 2,), 
in which each component z; is a number 1 to M, but no two components 
are equal. The event A, is the set of descriptions (z,,...,2,) in S such 
that z; = К; in symbols, A, = ((z,25...,2,): 2, = К}. Itisclear that 
МА] = (M — 1)! and N[S] = M!. If it is assumed that all descriptions 
in S are equally likely, then Р[А„] = 1/M. Thus we have proved that the 
probability of a person's choosing his own hat does not depend on whether 
he is the first, second, or even the last person to choose a hat. 4 


Sample description spaces in which the descriptions are subsets and 
partitions rather than n-tuples are systematically discussed in section 5. 
The following example illustrates the ideas. 


p» Example 2F. How to tell a prediction from a guess. In order to verify 
the contention of the existence of extrasensory perception, the following 
experiment is sometimes performed. Eight cards, four red and four black, 
are shuffled, and then each is looked at successively by the experimenter. 
In another room the subject of study attempts to guess whether the card 
looked at by the experimenter is red or black. He is required to say 
“black” four times and “red” four times. If the subject of the study has 
no extrasensory perception, what is the probability that the subject will 
“guess” correctly the colors of exactly six of eight cards? Notice that 
the problem is unchanged if the subject claimed the gift of "prophecy" 
and, before the cards were dealt, stated the order in which he expected 
the cards to appear. 

Solution: Let us call the first card looked at by the experimenter 
card 1; similarly, for k = 1,2,..., 8, let the Ath card looked at by the 
experimenter be called card k. To describe the subject’s response during 
the course of the experiment, we write the subset {21, 29, 23, 2} of the 
numbers (1, 2, 3, 4, 5, 6, 7, 8}, which consists of the numbers of all the 
cards the subject said were red. The sample description space S then 
consists of all subsets of size 4 of the set (1, 2, 3, 4, 5, 6, 7, 8). Therefore, 


N[S] = (i The event A that the subject made exactly six correct 
Al 

guesses may be repres 

exactly three of whose 


ented as the set of those subsets {21, z2, 23, 24}, 
members are equal to the numbers of cards that 


50 BASIC PROBABILITY THEORY CH. 2 


were, in fact, red. To compute the size of A, we notice that the three 
numbers in a description in A, corresponding to a correct guess, may be 


chosen in (3) ways, whereas the one number in a Mà in A, 
corresponding to an incorrect guess, may be chosen in (i) ways. 


4| [4 
Consequently, N[A] = (3) (1) and 


EXERCISES 


In solving the following problems, state carefully any assumptions made. 
In particular, describe the probability space on which the events, whose 
probabilities are being found, are defined. 


2.1. Two balls are drawn with replacement (without replacement) from an urn 
containing 8 balls, of which 5 are white and 3 are black. Find the proba- 
bility that (i) both balls will be White, (ii) both balls will be the same color, 
(iii) at least 1 of the balls will be white. 


22. Anurn contains 3 red balls, 4 white balls, and 5 blue balls. Another urn 


contains 5 red balls, 6 white balls, and 7 blue balls. One b. 
from each urn. What is the prob 
will be the same color? 


all is selected 
ability that (i) both will be white, (ii) both 


2.. Anurn contains 6 balls, numbered | to 6. Find the probability that 2 balls 
drawn from the urn with repl 


acement (without replacement), (i) will have 
а sum equal to 7, (ii) will have a sum €qual to К, for each integer К from 
2 to 12. 


2.4. Two fair dice 


are tossed. What is the probability that the sum of the dice 
will be (i) eq 


ual to 7, (ii) equal to k, for each integer k from 2 to 12? 
2.5. An urn contains 10 balls, bearing numbers 0 to 9. A 
drawn with replacement (without replacement). By placing the numbers 
in a row in the order in which they are drawn, an integer 0 to 999 is formed. 


What is the probability that the number thus formed is divisible by 39? 
Note: regard 0 as being divisible by 39. 


2.6. Four probabilists arrange to meet at the Grand Hotel in Paris. Ith 
that there are 4 hotels with that n 
that all the prob 


sample of size 3 is 


appens 

ame in the city. What is the probability 

abilists will choose different hotels? 

2.7. What is the probability that among the 32 persons who were President of 
the United States in the period 1789-1952 at least 2 were born on the same 
day of the year. 


SEC. 3 THE NUMBER OF “SUCCESSES” IN A SAMPLE 51 


2.8. Given a group of 4 people, find the probability that at least 2 among them 
have (i) the same birthday, (ii) the same birth month. 


2.9. Suppose that among engineers there are 12 fields of specialization and that 
there is an equal number of engineers in each field. Given a group of 6 
engineers, what is the probability that no 2 among them will have the same 
field of specialization? 


2.10. Two telephone numbers are chosen randomly from a telephone book. 
What is the probability that the last digits of each are (i) the same, (ii) 
different ? 


2.11. Two friends, Irwin and Danny, are members of a group of 6 persons who 
have placed their hats on a table. Each person selects a hat randomly from 
the hats on the table. What is the probability that (i) Irwin will get his own 
hat, (ii) both Irwin and Danny will get their own hats, (iii) at least one, 
either Irwin or Danny, will get his own hat? 

2.12. Two equivalent decks of 52 different cards are put into random order 
(shuffled) and matched against each other by successively turning over 
one card from each deck simultaneously. What is the probability that 
(i) the first, (ii) the 52nd card turned over from each deck will coincide? 
What is the probability that both the first and 52nd cards turned over from 
each deck will coincide? 


2.13. In example 2F what is the probability that the subject will guess correctly 
the colors of (i) exactly 5 of the 8 cards, (ii) 4 of the 8 cards? 


2.14. In his paper “Probability Preferences in Gambling," American Journal of 
Psychology, Vol. 66 (1953), pp. 349-364, W. Edwards tells of a farmer who 
came to the psychological laboratory of the University of Washington. The 
farmer brought a carved whalebone with which he claimed that he could 
locate hidden sources of water. The following experiment was conducted to 
test the farmer's claim. He was taken into a room in which there were 10 
coveredcans. He was told that 5 of the 10 cans contained water and 5 were 
empty. The farmer's task was to divide the cans into 2 equal groups, 1 
group containing all the cans with water, the other containing those with- 
out water. What is the probability that the farmer correctly put at least 
3 cans into the water group just by chance? 


3. THE NUMBER OF “SUCCESSES” IN A SAMPLE 


A basic problem of the theory of sampling is the following. An urn 
contains M balls, of which Мт are white (where My- < M) and Mp = 
M — Мур аге red. A sample of size n is drawn either without replacement 
(in which case n < M), or with replacement. Let k be an integer between 
О and п (that is, k = 0, 1, 2,..., or n). What is the probability that the 
sample will contain exactly k white balls? 

This problem is a prototype of many problems, which, as stated, do not 
involve the drawing of balls from an urn. 


52 BASIC PROBABILITY THEORY сн. 2 


p» Example ЗА. Acceptance sampling of a manufactured product. Consider 
the problem of acceptance sampling of a manufactured product. Suppose 
we are to inspect a lot of size M of manufactured articles of some kind, 
such as light bulbs, screws, resistors, or anything else that is manufactured 
to meet certain standards. An article that is below standard is said to 
be defective. Let a sample of size n be drawn without replacement from 
the lot. A basic role in the theory of statistical quality control is played 
by the following problem. Let k and M, be integers such that k < n 
and Mp < M. What is the probability that the sample will contain k 
defective articles if the lot contains Mp defective articles? This is the 


same problem as that stated above, with defective articles playing the role 
of white balls. 


p» Example3B. А sample-minded game warden. Consider a fisherman who 
has caught 10 fish, 2 of which were smaller than the law permits to be 
caught. A game warden inspects the catch by examining two that he 
selects randomly from among the fish. What is the probability that he 
will not select either of the undersized fish? This problem is an example 
of those previously stated, involving sampling without replacement, with 
undersized fish playing the role of white balls, and M — 10, My = 2, 


n — 2, К = 0. By (3.1), the required probability is given by (0) «4l 
(10), — 28/45. 4 


> Example ЗС. A sample-minded die. Another problem, which may be 
viewed in the same context but which involves sampling with replacement, 
isthefollowing. Leta fair die be tossed four times. What is the probability 
that one will obtain the number 3 exactly twice in the four tosses? This 
problem can be stated as one involving the drawing (with replacement) of 
balls from an urn containing balls numbered 1 to 6, among which ball 
number 3 is white and the other balls, red (or, more strictly, nonwhite). 
In the notation of the problem introduced at the beginning of the section 
this problem corresponds to the case M — 6, My = 1, п= 4, К = 2. 


Ву (3.2), the required probability is given by (5) очив 4 = 25/216. «4 


To emphasize the wide variety of problems, of which that stated at the 
beginning of the section is a prototype, it may be desirable to avoid 
references to white balls in the statement of the solution of the problem 
(although not in the statement of the problem itself) and to speak instead 
of scoring "successes." Let us say that we score a success whenever we 
draw a white ball. Then the problem can be stated as that of finding, for 
k —0,1,...,n, the Probability of the event A, that one will score 
exactly k successes when one draws a sample of size п from an urn 


SEC. 3 THE NUMBER OF "SUCCESSES" IN A SAMPLE 53 


containing M balls, of which M, are white. We now show that in the 
case of sampling without replacement 
/n| (My),(M — My), 


(3.1) PIA] (1) = а 


whereas in the case of sampling with replacement 


(Мум — Mn)" 


А k 520,1, seg He 
М" 


G2) Ple (") 

It should be noted that in sampling without replacement if the number 
M- of white balls in the urn is less than the size л of the sample drawn 
then clearly P[A,] = 0 fork = My- + 1...., п. Equation (3.1) embodies 
this fact, in view of (1.10). 

Before indicating the proofs of (3.1) and (3.2), let us state some useful 
alternative ways of writing these formulas. For many purposes it is 
useful to express (3.1) and (3.2) in terms of 


(3.3) p= M’ 


the proportion of white balls in the urn. The formula for P[A,] can then 
be compactly written, in the case of sampling with replacement, 


(3.4) P[A,] = (x) € = py". 


Equation (3.4) is a special case of a very general result, called the 
binomial law, which is discussed in detail in section 3 of Chapter 3. The 
expression given by (3.1) for the probability of & successes in a sample of 
size n drawn without replacement may be expressed in terms of p by 


(3.5) PIA] = (ra - py 
(i-23( 7-7 (53 aa) 


(1 2 Jeo (i o e] 
S Wes M. M— My 
W 


(м el i 


54 BASIC PROBABILITY THEORY CH. 2 


Consequently, one sees that in the case in which А/М, (n — k)/ 
(M — Мр), апа n/M are small (say, less than 0.1) then the probability 
of the event A, js approximately the same in sampling without replacement 
as it is in sampling with replacement. 

Another way of writing (3.1) is in the computationally simpler form 


(ee) Y = ru 
(3.6) PIA) = LEA жер 
(a) 
It may be verified algebraically that (3.1) and (3.6) agree. In section 5 
we discuss the intuitive meaning of (3.6). : 
We turn now to the proof of (3.1). Let the balls in the urn be numbered 
1 to M, the white balls bearing numbers 1 to Му. The sample description 
Space S then consists of n-tuples (21, %,...,2,), in which, for i= 
1,...,”, 2 is a number 1 to M, subject to the condition that no two 
components of an n-tuple may be the same. The size of S is given by 
N[S] = (M),. The event A, consists of all sample descriptions in S, 
exactly К components of which are numbers 1 to Мұ. To compute the 
size of A,, we first compute the size of events B of the following form. 
Let = (jj... . , ji} be a subset of size k of the set of integers {1, 2,.., п). 


Define B, as the event that white balls are drawn in and only in those 


draws whose draw numbers are in J; that is, B, is the set of descriptions 
(21, 25,...,2,) whose Ast, jnd,... ‚Аһ components are numbers 1 to 
My and whose remaining components are numbers Му + 1 to M. The 
size of By may be obtained immediately by means of the basic principle 
of combinatorial analysis. We obtain N[B;] = (My),(M — М), 
since there are (Mj), ways in which white balls may be assigned to the 
k components of a description in B, in which white balls occur and 
(M — My), 4, Ways in which nonwhite balls may be assigned to the 


remaining (n — К) components. Now, by (1.8), there are H subsets of 
k 


- of e integers {1,2,... n) For any two such subsets J and J' the 
: я ing events B; and В. are mutually exclusive, Further, the event 
may be regarded as the union, over such subsets J, of the events Ву. 


Consequently, the size of 4 is given by N[A] = (iarna = Мр), 

If we assume that all the descripti i qi i 
ptions in S are equally likely, we obtain 

(3.1). To prove (3.2), we use a similar ges." ý | 

» e 3D. The difference between k Successes and successes on K 

specified draws, Let a sample of size 3 be drawn without replacement from 


sEC. 3 THE NUMBER OF "SUCCESSES" IN A SAMPLE 55 


an urn containing six balls, of which four are white. The probability that 
the first and second balls drawn will be white and the third ball black is 
equal to (4).(2),/(6)3. However, the probability that the sample will 


contain exactly two white balls is equal to (5) оомо, If the sample 


is drawn with replacement, then the probability of white balls on the 
first and second draws and a black ball on the third is equal to (4)*(2)!/(6)5, 
whereas the probability of exactly two white balls in the sample is equal 


to (3) oye». 4 


p» Example ЗЕ. Acceptance sampling. Suppose that we wish to inspect a 
certain product by means of a sample drawn from a lot. Probability 
theory cannot tell us how to constitute a lot or how to inspect the sample 
or even how large a sample to draw. Rather, probability theory can tell 
us the consequences of certain actions, given that certain assumptions are 
true. Suppose we decide to inspect the product by forming lots of size 
1000, from which we will draw a sample of size 100. Each of the items 
in the sample is classified as defective or nondefective. It is unreasonable 
to demand that the lot be perfect. Consequently, we may decide to accept 
the lot if the sample contains one or fewer defectives and to reject the lot 
if two or more of the items inspected are defective. The question naturally 
arises as to whether this acceptance scheme is too lax or too stringent; 
perhaps we ought to demand that the sample contain no defectives, or 
perhaps we ought to permit the sample to contain two or fewer defectives. 
In order to decide whether or not a given acceptance scheme is suitable, 
we must determine the probability P that a randomly chosen lot will be 
accepted. However, we do not possess sufficient information to compute 
P. In order to compute the probability P of acceptance of a lot, using a 
given acceptance sampling plan, we must know the proportion p of 
defectives in a lot. Thus P is a function of p, and we write P(p) to denote 
the probability of acceptance of a lot in which the proportion of defectives 
is p. Now for the acceptance sampling plan, which consists in drawing 
a sample of size 100 from a lot of size 1000 and accepting it if the lot 
contains one or fewer defectives, P(p) is given by 


(10009) 00 | gg 10002010000), 
(1000), оо (1000), о , 


(3.7) Р(р) 


where we have let q = 1 — p. The graph of P(p) as a function of p is 
called the operating characteristic curve, or OC curve, of the acceptance 
sampling plan. In Fig. 3A we have plotted the OC curve for the sampling 


сн. 2 
6 BASIC PROBABILITY THEORY 
5 


"" " is 0.95 
i bility of accepting a lot is 
bed. We see that the proba tir ve 
V unes p rt defective items, whereas the probability of Md 
e Е B Н oz B " 
алы: is only 0.50 if it contains 1.7 ^; defective items. 


0.95 | 

09 
P(p) 
08 


07 


0 
0 001 0.02 003 0.04 005 006 007 008 009 Q1 ^ 
Fig.3A. An opera 


ting characteristic, or ОС, с 
а lot containing pr 


urve. P( 
Oportion defective p for sam 


р) is the Probability of accepting 
ple size n = 100 and acceptance number 1. 
> Example ЗЕ. Winning a prize in a lottery, 
n? tickets and awards n prizes, If one buys 7 ti 
of winning a prize? 

Solution: The probabilit 
probability P, of not winni 
probability that a sample o 
urn containing n? 
Consequently, 


G8). 


Consider a lottery that sells 
ckets, what is the probability 


Y P, of winning a prize is related to the 
ng a prize by P, = | — Po. Now P, is the 
f size n drawn without replacement from an 
tickets will not contain any of n Specified tickets. 


SEC. 3 THE NUMBER OF “SUCCESSES” IN A SAMPLE ST 


In the case that л = 10, Py = (90),9/(100),) = 0.330, so that Р, = 0.670. 
In the case that л is large it may be shown approximately, that, 


1 
(3.9) Py => = (2.718) = 0.368, P = 1 — e™ = 0.632. < 


In the foregoing we have considered the problem of drawing a sample 
from an urn containing balls of only two colors. However, one may 
desire to consider urns containing balls of more than two colors. In 
theoretical exercises 3.1 to 3.3 we obtain formulas for this case. The 
following example illustrates the ideas involved. 


Ь Example 3G. Sampling from three plumbers. Consider a town in which 
there are three plumbers, whom we call А, В, and C. On a certain day six 
residents of the town telephone for a plumber. If each resident selects a 
plumber at random from the telephone directory, what is the probability 
that three residents will call А, two residents will call B, and one resident 


will call C? 
Solution: For j= 1,2,....6 let z; = А, В, or C, depending on 


Whether the plumber called by the jth resident is 4, B, or C. The sample 
description space S of the observation is then a space of 6-tuples, S — 
(Ceis з... m): forj=1,...,6,2; = А, B or С}. Clearly, N[S] = 38. 
Next, the event £ that three residents call А, two call В, and one calls C 
has size 


| 6 6! 
(3.10) МЕ] = [ss ) = = 60 


ѕо that P[E] = 60/39 = 0.123. То prove (3.10), we note that the number 
of samples of size 6, which contain three calls for A, two calls for B, and 
one call for C, is the number of ways one can partition the set (1, 2,3,4, 5, 6} 
into three ordered subsets of sizes 3, 2, and 1, respectively. 


THEORETICAL EXERCISES 


3.1. Consider an urn containing M balls of r different colors. Let M,, 

Ms, ... , M, denote, respectively, the number of balls of color 1, color 
2,..., color r. Show the probability that a sample of size n will contain 
kı balls of color 1, К» balls of color 2,..., k, balls of color r, where 
ky + ky cocco К, =n, in the case of sampling with replacement is 


given by 
( n ) (Mh(OM3)'s + + + (M^ Ф 


Be) hs - (My 


58 


3.2. 


3.3. 


3.4. 


3.1, 


3:2. 


3.3. 


BASIC PROBABILITY THEORY CH. 2 
and, in the case of sampling without replacement, is given by 
n | OD OM3)R 7 77 (М, 
m pam k) ИШЕ à 
Hint: The number of samples of size n that contain k, balls of color 1, 


ky balls of color 2, . . . , k, balls of color r is equal to the number of ways 


one can partition a set of size л into r ordered subsets of sizes Ky Ky... 
k,, respectively. 


(3.12) 


Show that, in terms of the proportions 


М, _ М» _.. = Mr 
(3.13) n TOM P: = W’ Pr M" 
one may express (3.11) by 
(3.14) Е 


m z sie) Pipa: për 


Consider an urn containing » balls, each of a different color. Let r be any 
integer. Show the probability that a sample of size r drawn with рсе 
ment will contain гү balls of color 1, ry balls of color 2,..., r, balls o 
color n, where ry + ry +++ + fa = ris given by 


lj r ) 
п" Ауға ү” 


An urn contains M balls, numbered 1 to M. Let N numbers be designated 
"lucky," where N < M. Let a sam 


ple of size л be drawn either without 
replacement (in which case n < M), or with replacement. Show that the 
probability that the sample. will contain exactly & balls with "lucky 
numbers is given by (3.1) and (3.2), respectively, with Му replaced by N. 


EXERCISES 


Ап urn contains 52 balls, numbered 1 to 32, 
through 13 are considered "lucky." A sample of size 2 is drawn from the 
urn with replacement (without replacement). What is the probability that 
(i) both balls drawn will be "lucky," (ii) neither ball drawn will be "lucky," 
(iii) at least 1 of the balls drawn will be "lucky," (iv) exactly | of the balls 
drawn will be “lucky”? 


Suppose that numbers ! 


Ап urn contains 52 balls, numbered 1 to 52. 

1, 14, 27, and 40 are considered "lucky." A sample of size 13 is drawn from 
the urn with replacement (without replacement). What is the probability 
that the sample will contain (i) exactly 1 


M "lucky" number, (ii) at least 1 
lucky number, (iii) exactly 4 "lucky" numbers? 


A man tosses a fair coin 10 tim 


\ es. Find the probabilit that he will have 
(i) heads on the first 5 tosses, tails on the al бле s 


: à econd 5 tosses, (ii) heads on 
tosses 1, 3, 5, 7, 9, tails on tosses 2,4, 6, 8, 10, (iii) 5 heads апа 5 tails, (iv) at 
least 5 heads, (У) no more than 5 heads, 


Suppose that the numbers 


SEC. 
3.4. 


3.5. 


3.6. 


3.7. 


3.8. 


3.9. 


3.10. 


3.11. 


3.12. 


3.13. 


3 THE NUMBER OF ''SUCCESSES" IN A SAMPLE 59 


A group of » men toss fair coins simultaneously. Find the probability that 
the л coins (i) are all heads, (ii) are all tails, (iii) contain exactly 1 head, 
(iv) contain exactly 1 tail, (v) are all alike. Evaluate these probabilities for 
л = 2,3,4,5. 


Consider 3 urns; urn I contains 2 white and 4 red balls, urn II contains 
8 white and 4 red balls, urn Ш contains | white and 3 red balls. One ball 
is selected from each urn. Find the probability that the sample drawn will 
contain exactly 2 white balls. 


A box contains 24 bulbs, 4 of which are known to be defective and the 
remainder of which is known to be nondefective. What is the probability 
that 4 bulbs selected at random from the box will be nondefective? 


A box contains 50 razor blades, 5 of which are known to be used, the 
remainder‘ unused. What is the probability that 5 razor blades selected 


from the box will be unused? 


A fisherman caught 10 fish, 3 of which were smaller than the law permits 
to be caught. А game warden inspects the catch by examining 2, which he 
selects at random among the fish. What is the probability that he will not 
select any undersized fish? 

A professional magician named Sebastian claimed to be able to "read 
minds." In order to test his claims, an experiment is conducted with 5 
cards, numbered | to 5. A person concentrates on the numbers of 2 of 
the cards, and Sebastian attempts to "read his mind" and to name the 2 
cards. What is the probability that Sebastian will correctly name the 2 


cards, under the assumption that he is merely guessing? 


Find approximately the probability that a sample of 100 items drawn 
from a lot of 1000 items contains 1 or fewer defective items if the pro- 
portion of the lot that is defective is (i) 0.01, (ii) 0.02, (iii) 0.05. 


The contract between a manufacturer of electrical equipment (such as 
resistors or condensors) and a purchaser provides that out of each lot of 
100 items 2 will be selected at random and subjected to a test. In negotia- 
tions for the contract the following two acceptance sampling plans are 
considered. Plan (a): reject the lot if both items tested are defective; 
otherwise accept the lot. Plan (5): accept the lot if both items tested are 
good; otherwise reject the lot. Obtain the operating characteristic curves 
of each of these plans. Which plan is more satisfactory to (i) the purchaser, 
(ii) the manufacturer? If you were the purchaser, would you consider 


either of the plans acceptable? 


Consider a lottery that sells 25 tickets, and offers (i) 3 prizes, (ii) 5 prizes. 
If one buys 5 tickets, what is the probability of winning a prize? 


Consider an electric fixture (such as Christmas tree lights) containing 5 
electric light bulbs which are connected so that none will operate if any 
one of them is defective. If the light bulbs in the fixture are selected 
randomly from a batch of 1000 bulbs, 100 of which are known to be 
defective, find the probability that all the bulbs in the electric fixture will 


operate. 


60 BASIC PROBABILITY THEORY CH. 2 


3.14. An urn contains 52 balls, numbered 1 to 52. Find the probability that a 
sample of 13 balls drawn without replacement will contain (i) each of the 
numbers 1 to 13, (ii) each of the numbers 1 to 7. 


3.15. An urn contains balls of 4 different colors, each color being represented 
by the same number of balls. Four balls are drawn, with replacement. 


What is the probability that at least 3 different colors аге represented in the 
sample? 


3.16. From a committee of 3 Romans, 4 Babylonians, and 5 Philistines a sub- 
committee of 4 is selected by lot. Find the probability that the committee 


will consist of (i) 2 Romans and 2 Babylonians, (ii) | Roman, 1 Babylonian, 
and 2 Philistines; (iii) of 4 Philistines. 


3.17. Consider a town in which there are 3 plumbers; on a certain day 4 
residents telephone for a plumber. If each resident selects a plumber at 
random from the telephone directory, what is the probability that (i) all 
plumbers will be telephoned, (ii) exactly 1 plumber will be telephoned? 


3.18. Six persons, among whom are А and B, are arranged at random (i) in a 


tow, (ii) in a ring. What is the probability that (a) 4 and B will stand 


next to each other, (5) А and B will be separated by one and only one 
person? 


4. CONDITIONAL PROBABILITY 


In section 3 we have been concerned with problems of the following 


ntaining 100 light bulbs, of which five 
bility that a bulb selected from the box 
ension of this problem is the following. 
om a box containing 100 light bulbs, of 


by the conditional probabilit ent B 
ty of the event В, 
given the event A, denoted by P[B | А], we mean intuitively the probability 
umption that A has occurred. In other words, 


evalaution of the Probability of B in the light of 
occurred, 


SEC. 4 CONDITIONAL PROBABILITY 61 


random phenomenon in which the events A and B are defined. Let №; 
denote the number of occurrences of the event A in the N occurrences of 
the random phenomenon. Similarly, let Nj denote the number of 
occurrences of B. Next, let N ‚р denote the number of occurrences of 
the random phenomenon in which both the events А and В occur. 


Ь Example 4A. Thirty observed samples of size 2. Consider the following 
results of thirty repetitions of the experiment of drawing, without replace- 
ment, a sample of size 2 from an urn containing six balls, numbered 1 to 6: 


(1,60), (45, (L4. (G3. Q2. (43) 
3,1), ($0 (QD Q3. (45. (5.6) 
(5,4, (3,0), (63. ($9. Q5. (64 
(,3, (62, (40D. (L5. (46. (63) 
(2..3), (5, 2), (3, 6), (6, 4), (6, 4), (1, 2) 


If the balls numbered 1 to 4 are colored white, and the balls numbered 5 
and 6 are colored red, then the outcome of the thirty trials can be recorded 


as follows: 


(W, R), (W, R), (W, W), (RW), (И, И), (И, W) 
(W, W), (RW),  (W,W) (W,W)  (QW,R, (R, R) 

(R, W), (W, W), (R, W), (R, R), (W, R), (R, W) 
(W,W)  (R,W) (W,W), (WR), (W, R), (R, W) 
(W, W), (А, W), (W, К), (R, W), (А, W), (W, W) 


Let N , denote the number of experiments in which a white ball appeared 
on the first trial. Let Nz denote the number of experiments in which a 
white ball appeared on the second trial, and let № 45 denote the number 
of experiments in which white balls appeared at both trials. By direct 
enumeration, one obtains that N4 = 18, Ng = 21, and Nag= 11. << 


In terms of the frequency definition, the unconditional probabilities of 
the events А, B, and AB are given by 


М, 
(4.1) pa = %4, PB). PB 


Nap 
N 


On the other hand, the conditional probability P[B | А] of the event В, 
given the event A, represents the fraction of experiments in which A 
Occurred that B also occurred; in symbols, 


Na B 


(4.2) P[B| A] = Wu 


ә 
62. BASIC PROBABILITY THEORY CH. 2 


It should be noted that (4.2) makes sense only if N , is not zero. If N4 
is zero, we must regard P[ B | A] as being undefined. | И es 

Equation (4.2) represents the meaning of the ao о те n 
probability from the frequency point of view. Now, (4.2) may be T * 
in a manner that will indicate a formal definition of P[ | 4], which " 
embody the properties of conditional probability as it is intuitively 
conceived. We rewrite (4.2) (in the case that N 4 is not zero): 


(Мав) Р[АВ] 
ES TAVIT LIN) = ти" 


In analogy with (4.3) we now 


give the following formal definition of 
P[B | А]: 


FORMAL DEFINITION Or CONDITIONAL PROBABILITY. Let 4 and B be two 


events on a sample description space 5, on the subsets of which is defined 
a probability function P[-], The conditional probability of the event В, 
given the event А, denoted by P[J | A], is defined by 


P[AB] . "T 
(4.4) P[B| 4] = paj 11412-0, 
and if P[A] = 0, then P[B | A] is undefined. 


> Example 4B. Computing a conditional probability, Consider the 
problem of drawing, without replacement, a sample of size 2 from an 
urn containing four white and two red balls. Let А denote the event that 
the first ball drawn is White, and B, the event that the second ball drawn 


is white. Let us compute P[B | A]. By (2.6), it follows that P[AB] — 
(4+ 3)/(6 5) = 32, whereas Р[А] = $ = 22. Therefore, P[B| А] = 12 = 
0.6, which accords wi i 


‚ of which 


three are white, Compare 
puted probabilities wi 


these theoretically com 


th the observed relative fre- 
quencies in example 4A, We have N.4,/N = áo NN = 30> МавіМ д = 
ii = 0.611. 
We next give a formula that may help to clarify the difference between 
the unconditional and the condit 


ion 
ny events B and А such th 


(4.5) P[B] = 


al probability of an event В. We 
at P[A] > 0, 

P[B | А]Р[А] + P[B | АРГ]. 

Equation (4.5) is proved as follows. From the definition of conditional 
probability given by (4.4) one has the basic formula 

(4.6) 


Р[АВ] = P[A] P[B| А]. 


have, for a 


SEC. 4 CONDITIONAL PROBABILITY 63 


Similarly, one has P[A°B] = Р[А“]Р[В | A]. Now, the events AB and A°B 
are mutually exclusive, and their union is В. Consequently, P[B] = 
P[AB] + P[4^B]. The desired conclusion may now be inferred. 


Ь Example 4C. A numerical verification of (4.5). Consider again the 
problem in example 4B. One has Р[А] = $. Therefore, P[A] == 1. Next, 
one has P[B | A] = >. However, from this it does not follow that P[B | 4] = 
$. Rather, by use of definition (4.4), P[B| А] = 3: one may also obtain 
this result by intuitive reasoning (which is made rigorous in section 4 of 
Chapter 3), for if a white ball were not drawn on the first draw, there 
would be four white balls among the five balls in the urn from which the 
second draw would be made. Then, by (4.5), PIB] = (DG) + (00) = 

< 


10 2 
15 = 3. 


Example 4D yields conclusions which students, on first acquaintance, 
often think startling and contrary to intuition. 


» Example 4D. Consider а family with two children. Assume that each 
child is as likely to be a boy as it is to be a girl. What is the conditional 
probability that both children are boys, given that (i) the older child is a 
boy, (ii) at least one of the children is a boy? 

Solution: Let A be the event that the older child is a boy, and let B be 


the event that the younger child is a boy. Then 4 U B is the event that 


at least one of the children is a boy, and AB is the event that both children 
are boys. The probability that both children are boys, given that the 


Older is a boy, is equal ta 


P[AB] 1/4 ! 


The probability that both children are boys, given that at least one of 
them is a boy, is equal to 


1/4 1 
P[AB] м 4 


à ums 


(4.8) P[AB|4 У В]= pr. gj. 3j 


» Example 4E. The outcome of a draw, given the outcome of a sample. Let 
à sample of size 4 be drawn with replacement (without replacement), 
from an urn containing twelve balls, of which eight are white. Find the 
Conditional probability that the ball drawn on the third draw was white, 
Biven that the sample contains three white balls. | 

Solution: Let А be the event that the sample contains exactly three 
White balls; and let В be the event that the ball drawn on the third draw 


TRU 
64 BASIC PROBABILITY THEORY CH. Z 


as white. The problem at hand is to find P[B| А]. In the case of sampling 
№. š 
with replacement 


\ 3! 3 
Qe o E Lo s 
(4.9) Р[А] = "34^ , P[AB] = "029^ > (0 x 
3 
In the case of sampling without replacement 
3 3) 
MICE (2) HET 
410 РМ] = А35, rug Peja =E, 
i (12), (12), i: 


More generally, it may be proved (see theoretical exercise 4.5) that if a 
sample of size n contains k white balls then the probability is k/n that on 
any specified draw a white ball was drawn. Note that this result is the 
same, no matter what the composition of the urn and irrespective of 
whether the sample was drawn with or without re 
one may express the results 
draw all balls in the sampl 
attempt to solve the probl 
draw any one of the four 
these three are white, so t 
on the third draw is 2, 


placement. In a sense, 
just stated by the statement that on any given 
€ are equally likely to occur. Many students 
em given here by reasoning that on the third 
balls in the sample could have occurred and of 
hat the (conditional) probability of a white ball 
in agreement with the foregoing equations. How- 
ever, this line of reasoning consists in making assumptions in addition to 


those made in our derivation of these equations. It is desirable to prove 
that these new assumptior 


1$ are a consequence of the model postulated in 
deriving (4.9) and (4.10). 


THEORETICAL EXERCISES 


4.1. Prove the followin 
PIC] > 0. These 


А relations illustrate the fact that al 
9n probabilities аг 


| general theorems 
l e also valid for conditional probabilities with respect 
to any particular event С. 
(i) PIS| C] = 1 where S is the certain event. 
(ii) PIA|C] = 1 if C is a subevent of A. 
(iii) PA|C)-0  itP[g <0, 
(iv) PIA V B|c] = Pld | ©] + P[B | C] — P[AB[ C]. 
(v) 


PIA |С] =1 — PAL cy, 


SEC. 


4. 


4.3. 


4.4. 


4.5. 


4.1. 


42. 


4.3. 


44. 


4 CONDITIONAL PROBABILITY 65 


Let B be an event of positive probability. Show that for any event А, 
(i) ACB implies P[A | B] = P[A]/PLB], 
(ii) BCA implies P[A | B] = 1. 


Let А and B be two events, each with positive probability. Show that 
statement (i) is true, whereas statements (ii) and (iii) are, in general, false: 


(i) Р[А | B] + Р[А | B] = 1. 
(ii) Р[А | B] + P[A | В = 1. 
(iii) P[A | B] + Р[А | B] = 1. 


An urn contains M balls, of which M, are white (where Му. < M). 
Let a sample of size n be drawn from the urn either with replacement or 
without replacement. For j = 1, 2,...,л let B; be the event that the 
ball drawn on the jth draw is white. For k = 1, 2,...,л let A, be the 
event that the sample (of size 7) contains exactly А white balls. Show that 
P[B; | А] = k/n. Express this fact in words. 


An urn contains M balls, of which Myy are white. л balls are drawn and 
laid aside (not replaced in the urn), their color unnoted. Another ball is 
drawn (it is assumed that л is less than M). What is the probability that 
it will be white? Hint: Compare example 2B. 


EXERCISES 


A man tosses 2 fair coins. What is the conditional probability that he has 
tossed 2 heads, given that he has tossed at least 1 head? 


An urn contains 12 balls, of which 4 are white. Five balls are drawn and 
laid aside (not replaced in the urn), their color unnoted. 
(i) Another ball is drawn. What is the probability that it will be white? 


(ii) A sample of size 2 is drawn. What is the probability that it will contain 


exactly one white ball? 
(iii) What is the conditional probability that it will contain exactly 2 white 


balls, given that it contains at least 1 white ball. 


In the milk section of a self-service market there are 150 quarts, 100 of 
which are fresh, and 50 of which are a day old. 

(i) If 2 quarts are selected, what is the probability that both will be fresh? 
(ii) Suppose that the 2 quarts are selected after 50 quarts have been removed 
from the section. What is the probability that both will be fresh? 

(iii) What is the conditional probability that both will be fresh, given that 


at least 1 of them is fresh? 


The student body of a certain college is composed of 60% men and 40% 
women. The following proportions of the students smoke cigarettes: 
40% of the men and 60% of the women. What is the probability that a 
student who is a cigarette smoker is a man? A woman? 


66 
4.5. 


4.6. 


4.7. 


4.8. 


4.9. 


4.10. 


4.11. 


4.12. 


4.13. 


BASIC PROBABILITY THEORY CH. 2 


Consider two events А and B such that P[A] = 1, РІВ | А] = 1, P[A | B] ial 
1. For each of the following 4 statements, state whether it is true or false: 
(i) The events A and B are mutually exclusive, (ii) A is a subevent of B, 
(iii) PLA* | B] = ł; (iv) PIA | B] + PLA | BY = 1. 

Consider an urn containing 12 balls, of which 8 are white. Let a sample 
of size 4 be drawn with replacement (without replacement). What is the 
conditional probability that the first ball drawn will be white, given that 
the sample contained exactly (i) 2 white balls, (ii) 3 white balls? 


Consider an urn containing 6 balls, of which 4 are white. Let a sample 
of size 3 be drawn with replacement (without replacement). Let А denote 
the event that the sample contains exactly 2 white balls, and let B denote 


the event that the ball drawn on the third draw is white. Verify numeri- 
cally that (4.5) holds in this case. 


Consider an urn containing 12 balls, of which 8 are white. Let 
of size 4 be drawn with replacement (without replacement). What is the 
conditional probability that the second and third balls drawn will be 
White, given that the sample contains exactly three white balls? 


Consider 3 urns; urn I contains 2 white and 4 red balls, urn II contains 
8 white and 4 red balls, urn Ш contains 1 white and 3 red balls. One 


ball is selected from each urn. What is the probability that the ball 
selected from urn II w 


ill be white, given that the sample drawn contains 
exactly 2 white balls? 


a sample 


actly 3 white balls? 


ain exactly 3 white balls, 
ball placed їй the urn was white? 


at is the (conditional) probability that the 
ven that (i) the sum is odd, (ii) the sum is 
€ of the first dice was odd, (iv) the outcome 
‹ V) the outcome of at least he dice was 
odd, (vi) the 2 dice had the s Tue dede 


ame outcomes, (vii) the 2 dice had different 
he 2 dice was 137 


ace of spades, (ii) at least one 
i) need not be equal. 
puel on each of which is marked Off a side 1 and side 2. 


ored black and side 2 is colored 
f th What is the (conditional) probability 
, the card selected is red the other side of the card will 
obability that if side 1 of the card 


selected is examin i i 1 
Hint: Compare example 4D, d side 2 of the card will be black? 


SEC. 5 UNORDERED AND PARTITIONED SAMPLES 67 


4.14. A die is loaded in such a way that the probability of a given number 
turning up is proportional to that number (for instance, a 4 is twice as 
probable as a 2). 

(i) What is the probability of rolling a 5, given that an odd number turns 
up. 

(ii) What is the probability of rolling an even number, given that a number 
less than 5 turns up. 


5. UNORDERED AND PARTITIONED SAMPLES— 
OCCUPANCY PROBLEMS 


We have insisted in the foregoing that the experiment of drawing a 
sample from an urn should always be performed in such a manner that 
one may speak of the first ball drawn, the second ball drawn, etc. Now 
it is clear that sampling need not be done in this way. Especially if one 
is sampling without replacement, the balls in the sample may be extracted 
from the urn not one at a time but all at once. For example, as in a 
bridge game, one may extract 13 cards from a deck of cards and examine 
them after all have been received and rearranged. If л balls are extracted 
all at once from an urn containing M balls, numbered 1 to M, the outcome 
z,} of the numbers 1 to M, 


of the experiment is а subset Tis os esca 3 
rather than an z-tuple (4, 2». - - .,z,) whose components are numbers 


1 to M. 

ern thus led to define the notions of ordered and unordered samples. 
A sample is said to be ordered if attention is paid to the order in which 
the numbers (on the balls in the sample) appear. A sample is said to be 
unordered if attention is paid only to the numbers that appear in the 
sample but not to the order in which they appear. The sample description 
Space of the random experiment of drawing (with or without replacement) 
ample of size 1 from an urn containing M balls numbered 


an ordered s 5 Ч à 
I to M consists of n-tuples (21, 2» + - -> z,), in which each component z; is 
a number 1 to M. The sample description space S of the random experi- 


ment of drawing (with or without) replacement an unordered sample of 


Size n from an urn containing M balls numbered 1 to M consists of sets 
in which each member z; is a number 1 to M. 


C Bis avg z,1 of size n, 
» Example 5A. All possible unordered samples of size 3 from an urn 
containing four balls. In example 1А we listed all possible sample descrip- 
tions in the case of the random experiment of drawing, with or without 
Teplacement, an ordered sample of size 3 from an urn containing four 
balls. We now list all possible unordered samples. If the sampling is 


68 BASIC PROBABILITY THEORY CH. 2 


done without replacement, then the possible unordered samples of size 3 
that can be drawn are 


{1,2,3} {1.24} {1,34} 0,3,4 


If the sampling is done with replacement, then the possible unordered 
samples of size 3 that can be drawn are 


{11,1}, 0,2,2), 3,3,3), (44,4) 

{15152}; (02,34 133,4) 

{1,1,3}, 0,2,4), (3,4,4), 

{1,1,4},  {2, 3, 3}, 

(1,2,2,  {2,3, 4}, 

{1, 2,3}, (244, 

{1, 2, 4}, 

{1, 3, 3}, 

{1, 3, 4}, 

(1,4, 4}, 4 


We next compute the size of S. In the case of unordered samples. 

: S MY . 
drawn without replacement, it is clear that N[S] = ( n J Since the number 
of unordered samples of size n is the same as the nu 


n of the set {1,2,...,М 
replacement, one may 


pred 
n с 


In section 3 the problem of the number of successes in a sample was 
considered under the assumption that the sample was ordered. Suppose 
now that an unordered sample of size n is drawn from an urn containing 
M balls, of which Мур are white. Let us find, for k = 0,1,...,n, the 
probability of the event А, that the sample will contain exactly k white 
balls. We consider first the case of sampling without replacement. Then 


M ; = E x. ой 
NIS] = js . Next, N[4,] = s ee rail since any description 


x contains k white balls, which can be chosen in rj 


mber of subsets of size 
j. In the case of unordered samples drawn with 
Show (see theoretical exercise 5.2) that N[S] — 


Таз з... z,)in А 
ways, and (n — К) nonwhite balls, which can be chosen in rh ~ rd 


n—k 
Ways. Consequently, awn without 
replacement 


(5.1) PIA] = ie) 


(") 


in the case of unordered samples dr: 


SEC. 5 UNORDERED AND PARTITIONED SAMPLES 69 


It is readily verified that the value of P[A;], given by the model of unordered 
samples, agrees with the value of P[A,], given by the model of ordered 
samples, in the case of sampling without replacement. However, in the 
case of sampling with replacement the probability that an unordered 
sample of size п, drawn from an urn containing M balls, of which Mj; 
are white, will contain exactly & white balls is equal to 


My + k — й M—My+n—k—1 
( k n—k 


(52)  P[4]—- Pp +п— ) : 


n 


Which does not agree with the value of P[A,], given by the model of 
ordered samples. 


Ь Example 5B. Distributing balls among urns (the occupancy problem). 
Suppose that we are given M urns, numbered 1 to M, among which we 
are to distribute л balls, where n < M. What is the probability that each 
of the urns numbered 1 to л will contain exactly 1 ball? 

Solution: Let A be the event that each of the urns numbered 1 to л will 
contain exactly 1 ball. In order to determine the probability space on 
which the event 4 is defined, we must first make assumptions regarding 
(i) the distinguishability of the balls and (ii) the manner in which the 
distribution of balls is to be carried out. | | 

If the balls are regarded as being distinguishable (by being labeled with 
the numbers 1 to л), then to describe the results of distributing balls 
among the N urns one may write an n-tuple Ci 2, e ‚ 21), whose jth 
component z; designates the number of the urn in which ball j was 
deposited. If the balls are regarded as being all alike, and therefore 
indistinguishable, then to describe the results of distributing n balls 
among the № urns one may write a set {21,25 ...› Zn} of size n, in which 
each member z, represents the number of an urn into which a ball has 
been deposited. Thus ordered and unordered samples correspond in the 
occupancy problem to distributing distinguishable and indistinguishable balls, 
respectively. 

Next, in distributing the balls, one may or may not impose an exclusion 
rule to the effect that in distributing the balls one ball at most may be 
put into any urn. It is clear that imposing an exclusion rule is equivalent 2, 
to choosing the urn numbers (sampling) without replacement, since an 
urn may be chosen once at most. If an exclusion rule is not imposed, so 
that in any urn one may deposit as many balls as one pleases, then one is 


choosing the urn numbers (sampling) with replacement. 
Let us now return to the problem of computing P[4]. The size of the 


70 BASIC PROBABILITY THEORY сн. 2 


TABLE 5A 
The number of ways in which л balls may be distributed 
into M distinguishable urns 
Balls Distinguishable Indistinguishable 
distributed balls balls 
м" iP tn-l 
Without n ) With 
exclusion Maxwell-Boltzmann | Bose-Finstein replacement 
statistics statistics 
| " 
With M n ) Without 
exclusion ш Fermi-Dirac replacement 
L statistics 
a 
Ordered samples Unordered Samples 
samples drawn 
The number of ways in which samples of size л may be 
drawn from an urn containing M distinguishable balls 
а 


sample description Space is given in Table 
possible cases. Next, let us determine the size of A. Whether or not an 
exclusion rule is imposed, we obtain МА] = n! if the balls are dis- 
tinguishable and N[A] = 1 if the balls are indistinguishable. Consequently, 
if the balls are distinguishable and distributed without exclusion, 


5A for each of the various 


if the balls are indistinguishable and distributed without exclusion, 


(5.4) Fei. 
(4l re Tn—l" 
‚б е, 
2 Fin are distributed with exclusion, it makes no difference whether 
the ba 


15 are considered distinguishable or indistinguishable, since 


(5.5) P[4|] = ain: = er 


n 


SEC. 5 UNORDERED AND PARTITIONED SAMPLES 71 


Each of the different probability models for occupancy problems, described 
in the foregoing, find application in statistical physics. Suppose one seeks 
to determine the equilibrium state of a physical system composed of a 
very large number л of particles" of the same nature: electrons, protons, 
photons, mesons, neutrons, etc. For simplicity, assume that there are M 
microscopic states in which each of the particles can be (for example, 
there are M energy levels that a particle can occupy). To describe the 
macroscopic state of the system, suppose that it suffices to state the M- 
tuple (7, ль, . . . , пу) Whose jth component л, is the number of “particles” 
in the jth microscopic state. The equilibrium state of the system of particles 
is defined as that macroscopic state (л, M ---, лу) with the highest 
probability of occurring. To compute the probability of any given 
macroscopic state, an assumption must be made as to whether or not 
the particles obey the Pauli exclusion principle (which states that there 
cannot be more than one particle in any of the microscopic states). If 
the indistinguishable particles are assumed to obey the exclusion principle, 
then they are said to possess Fermi-Dirac statistics. If the indistinguishable 
particles are not required to obey the exclusion principle, then they are 
said to possess Bose-Einstein statistics. If the particles are assumed to be 
distinguishable and do not obey the exclusion principle, then they are 
said to possess Maxwell-Boltzmann statistics. Although physical particles 
cannot be considered distinguishable, Maxwell-Boltzmann statistics are 
correct as approximations in certain circumstances to Bose-Einstein and 


Fermi-Dirac statistics. «4 


The probability of various events defined on the general occupancy and 
sampling problems are summarized in Table 6A on p. 84. | 

Partitioned Samples. If we examine certain card games, we may notice 
Still another type of sampling. We may extract n distinguishable balls (or 
cards) from an urn (or deck of cards), which can then be divided into a 
number of subsets (in a bridge game, into four hands). More precisely, 
we may specify a positive integer r and nonnegative integers Kes Kas + E Is 
such that Кү + kg +... + К, = n. We then divide the sample of size n 
into r subsets; a first subset of size Ку, a second subset of size Ky... , 
an rth subset of size k,. For example, in the game of bridge there are four 
hands (subsets), each of size 13, called East, North, West, and South 
(instead of first, second, third, and fourth subsets). The outcome of a 


sample taken in this way is an r-tuple of subsets, 
"NE н 
(5.6) (ба +++, а} Ba LT а.) (dE ip 5 2-4) 


Whose first component is the first subset, second component is the second 


subset, . .., rth component is the rth subset. We call a sample of the 


72: BASIC PROBABILITY THEORY CH. 2 


form of (5.6) a partitioned sample, with partitioning scheme (r: k 
| —— А Я 


D 


p» Example 5C. An example of partitioned samples. 
experiment of drawing a sample of size 3 from an u 
balls, numbered 1 to 4. If the sampling is done without replacement, 
and the sample is partitioned, with partitioning scheme (2; 1, 2), then 
the possible samples that could have been drawn are 


(0.0.39, (02), (1,3), — (3), 0,2), (45, (1, 2) 

(1), 12.4), 00), 01,4), — (GL (,4), (4, (1,3) 

(0), 0,3), (02), 43,4), — (3Y (2,4, (04), 02, 3) 
If the sampling is done with re 
with partitioning scheme Q; 
have been drawn are 


Consider again the 
rn containing four 


placement, and the sample is partitioned, 
1, 2), then the possible samples that could 


CRED D, (эп, I» — (4. {1,1) 

(15,31, 2)), (Q2, (1, 25, (3). (1, 25, ({4}, (1, 2}) 

(1), (1,3), (2), (1,3), ({3}, (1,3), (4), (1, 3}) 

(D. 0,45, — (Q2), 11,4), ({3}, (1,4), — (4, (1, 4) 

(5.0.25, — (2), (2,2), (83), 02, 2)), — (4, 0,2) 

({1}, {2,3}, — (2, (2, 3, (350,3), — ((4), 2,3) 

(0.0.4), — (2) Q, 4y, (35.0.49, — ({4}, 2, 4}) 

(0.0.3), — (2), (3,3), (38,35), — (4, (3,35 

(0.0.45, (12), (3,4), (03), (3,4), — (4), (3, 4) 

(0.0.49, (фы), (зун 4» 0006 4 
We next derive formu which partitioned 

samples may be drawn. 


In the case of sampling without replacement from an urn containing 
M balls, numbered 1 to M, the number of Possible partitioned samples 
of size n, with partitioning scheme (ri A Ieri. k,), is equal to 


EUR =т=. 


las for the number of Ways in 


M 
kk k, M — nf” 


M = 
и) Possible subsets of К, balls, (M. k M possible 
not D a balls (there аге M — k, balls available from which to select 
ч s e 5 to А cond subset), it follows that there are 
се а 


ie ways in which to select the rth subset, 


Since there are 


i 


SEC. 5 UNORDERED AND PARTITIONED SAMPLES 73 


In the case of sampling with replacement from an urn containing M 
balls, numbered 1 to M, the number of possible partitioned samples of 
size n, with partitioning scheme (r; Ky, Ks, +++, k,), is equal to 

M+k,—1\(M+hk,—1)...(M+k,-1 
( kı ) ( К» ) ( k i 

The next example illustrates the theory of partitioned samples and 

provides a technique whereby card games such as bridge may be analyzed. 


P» Example 5D. Ап urn contains fifty-two balls, numbered 1 to 52. Let 
the balls be drawn one at a time and divided among four players in the 
following manner: for j = 1, 2, 3, 4, balls drawn on trials numbered j + 4k 
(for k = 0, 1,..., 12) are given to player j. Thus player 1 gets the balls 
drawn on the first, fifth, . . . , forty-ninth draws, player 2 gets the balls 
drawn on the second, sixth, . . . , fiftieth draws, and so on. Suppose that 
the balls numbered 1, 11, 31, and 41 are considered "lucky." What is 
the probability that each player will have a "lucky" ball? 

Solution: Dividing the fifty-two balls drawn among four players in the 
manner described is exactly the same process as drawing, without replace- 
ment, a partitioned sample of size 52, with partitioning scheme (4; 13, 13, 
13,13). The sample description space S ofthe experiment being performed 
here consists of 4-tuples of mutually exclusive subsets, of size 13, of the 
‚ 4) the jth subset represents 
the balls held by the jth player. The size of the sample description space 
is the number of ways in which a sample of fifty-two balls, partitioned 
in the way we have described, may be drawn from an urn containing 


fifty-two distinguishable balls. Thus 

52 (39 [26 a _. 521 
a == tests = By 
We next calculate the size of the event 4 that each of the four players 
Will have exactly one “lucky” ball. First, consider a description in A that 
has the following properties: player 1 has ball number 11, player 2 has 
ball number 41, player 3 has ball number 1, and player 4 has ball number 
31. Each description has forty-eight members about which nothing has 
been specified ; consequently there are (485(12071 descriptions, for in 
this many ways can the remaining forty-cight balls he distributed among 
the members of the description. Now the four “lucky” balls can be 


(5.8) 


r 


distributed among the four hands in 4! ways. Consequently, 
(48!) 
(5.10) МА] = 4! тоту 


and the probability that each player will possess exactly one “lucky” ball 


4 


is given by the quotient of (5.10) and (5.9). 


14 


2 
BASIC PROBABILITY THEORY CH. Z 


The interested reader may desire to consider for himself the theory of 
partitions that are unordered, rather than ordered, arrays of subsets. 


5:1. 


THEORETICAL EXERCISES 


An urn contains M balls, numbered 1 to M. A sample of size n is drawn 
without replacement, and the numbers on the balls are vo Y" m 
increasing order of their numbers: *Xj«X,-—...«m, Let К bea 


number | to M, and k, a number 1 to n. Show the probability that. = K 
is 


(5.11) үш җы! ы 


5.2. The number of unordered samples with replacement. Let U(M, п) denote the 


5.3. 


5.4. 


number of unordered samples of size n that one may draw, by sampling 
with replacement, from an urn containing M distinguishable balls. Show 
that U(M, n) = (M TEM, 

Hint. To prove the assertion, make use of the principle of mathematical 


induction. Let P(n) be the Proposition that, whatever M, U(M, n) = 
d +п— ') 


M . P(1) is clearly true, since there are M unordered samples 
of size 1. To complete 
The following formula 
and n = 1,2 à: 


the proof, we must show that P(n) implies P(n + 1). 
is immediately obtained: for any M =1,2,..., 


he example in the text involving 
containing 4 balls). Then there 
se first entry is 1, U(M — 1, n) 
is 2, and so on, until there are 


A | : induction hypothesis, U(k, n) = 
+n- =] 
( ы ). Consequently, U(k, n) = С 2 - (* we 1 ). We 


thus determine that U(M,n +1) = vd т, so that P(n + 1) is proved, 
and the asserted formula for U(M, n) is proved by mathematical induction. 
Show that the number of ways in which n indistinguishable objects may be 
arranged in M distinguishable cells is (М T B ') = lag at '). 

Let n > of ways in which n indistinguishable 
objects nguishable cells so that no cell will 


M. Show that the number 

may be arranged in M disti 
(nl = 

be empty is ( ) = [s En | - Hint. It suffices to find the number 


M) indistinguishable objects may be arranged in 
S, since after placing 1 object in each cell the 
be arranged without restriction. 


SEC. 5 


5.1. 


5.2. 


5.3, 


5.4. 


5.5. 


5.6. 


53. 


5.8. 


5.9. 


5.10. 


UNORDERED AND PARTITIONED SAMPLES 75 


EXERCISES 


On an examination the following question was posed: From a point on 
the base of a certain mountain there are 5 paths leading to the top of the 
mountain. In how many ways can one make a round trip (from the base 
to the top and back again)? Explain why each of the following 4 answers 


was graded as being correct: (i) (5). = 20, (ii) 5? = 25, (iii) 8) = 10, 
tiv) (6) = 15. 


A certain young woman has 3 men friends. She is told by a fortune teller 
that she will be married twice and that both her husbands will come from 
this group of 3 men. How many possible marital histories can this 
woman háve? Consider 4 cases. (May she marry the same man twice? 
Does the order in which she marries matter ?) 


The legitimate theater in New York gives both afternoon and evening 
performances on Saturdays. A man comes to New York one Saturday 
to attend 2 performances (1 in the afternoon and 1 in the evening) of the 
living theater. There are 6 shows that he might consider attending. In 
how many ways can he choose 2 shows? Consider 4 cases. 


An urn contains 52 balls, numbered 1 to 52. Let the balls be drawn 1 at 
a time and divided among 4 people. Suppose that the balls numbered 
1, 11, 31, and 41 are considered "lucky." What is the probability that 
(i) each person will have a “lucky” ball, (ii) 1 person will have all 4 “lucky” 
balls? 

A bridge player announces that his hand (of 13 cards) contains (i) an ace 
(that is, at least 1 ace), (ii) the ace of hearts. What is the probability that 
it will contain another one? 

What is the probability that in a division of a deck of cards into 4 bridge 
hands, | of the hands will contain (i) 13 cards of the same suit, (ii) 4 aces 
and 4 kings, (iii) 3 aces and 3 kings? 

Prove that the probability of South's receiving exactly k aces when a 


bridge deck is divided into 4 hands is the same as the probability that a 
hand of 13 cards drawn from a bridge deck will contain exactly К aces. 


An urn contains 8 balls numbered 1 to 8. Four balls are drawn without 
replacement; suppose 2 is the second smallest of the 4 numbers drawn. 
What is the probability that = = 3? 

A red card is removed from a bridge deck of 52 cards; 13 cards are then 
drawn and found to be the same color. Show that the (conditional) 
probability that all will be black is equal to 3. 

A room contains 10 people who are wearing badges numbered 1 to 10. 
What is the probability that if 3 persons are selected at random (i) the 
largest (ii) the smallest badge number chosen will be 5? 


16 BASIC PROBABILITY THEORY cH. 2 


5.11. From a pack of 52 cards an even number of cards is drawn. Show that 
the probability that half of these cards will be red and half will be black 


15 А 
( 521 1) «e? —1). 


(2612 ^ 
Hint. Show, and then use (with л = 52), the facts that for any integer л 
em Gals () +--=() +(%)+ 9) 
-0 Х (| - à * c-iy(r) = 24 
ae ш k | 
\2 nx 2 2, (2n)! 
(5.13) (9) + i) (") = G)- e З 


6. THE PROBABILITY ОЕ OCCURRENCE ОЕ А 
GIVEN NUMBER OF EVENTS 


Consider M events Ау, 45, . .., А y defined on a probability space. In 
this section we shall develop formulas for the probabilities of various 
events, defined in terms of the events А... Agr, especially for т = 
0, 1,..., M, that (i) exactly m of them, (ii) at least m of them, (iii) no 
more than т of them will occur. With the aid of these formulas, а 


variety of questions connected with sampling and occupancy problems 
may be answered. 


THEOREM. Let A,, 45, ..., Ауу be M events defined on a probability 
space. Let the quantities Sy, S,,..., S y; be defined as follows: 


$,—1 


M 
S= È P[A,] 
| 


м M 
= EPA 
kel ka=k,+1 
(6.1) 
S M M M 
p= " P 
Кү=1 k PN n PM 14,4, Ar] 


Su = P[A A++ Ам] 


SEC. 6 OCCURRENCE OF А GIVEN NUMBER OF EVENTS 77 
The definition of S, is usually written 


(6.1) = > * Р[А, Ar, os A]. 
in which the summation in (6.1) is over the (7) possible subsets 
(fs, .-., ko} of size r of the set {1, 2,..., M]. 

Then, for any integer m = 0, 1, . . . , M, the probability of the event В,, 


that exactly m of the M events А1,..., A45; will occur simultaneously is 
given by 
м " 
(2) PB] = X -7-7(7)5. 
r=m 
m+1 m+2 Am M 
m ( m Js ay ( a ) Smia == MES 


In particular, for m = 0, 

(6.3) Р[В] = 1 — Sı + $5 — S3 + Sy —:* x (51). 

The probability that at least m of the M events 4,,... , A45; will occur is 
given by (for m > 1) М | 

(64) РІВ,] + РІВ] + РВ] = > T = js 


r=m 


Before giving the proof of this theorem, we shall discuss various applica- 
tions of it. 

> Example 6A. The matching problem (case of sampling without replace- 
ment). Suppose that we have M urns, numbered 1 to M, and M balls, 
numbered 1 to M. Let the balls be inserted randomly in the urns, with one 
ball in each urn. If a ball is put into the urn bearing the same number as 
the ball, a match is said to have occurred. Show that the probability that 


(i) at least one match will occur is 


1 1 3 Ыш инем. i ez 
(6.5) boty egg tet = 062213, 
(ii) exactly m matches will occur, for m = 0, 1,..., M, is 
| UL р a c 
lige ttm ae a 


(6.6) — 
m! 


== 1 еї {ог M — m large. 
m! 


The matching problem may be formulated in a variety of ways. First 


78 BASIC PROBABILITY THEORY CH. 2 


variation: if M married gentlemen and their wives (in a monogamous 
society) draw lots for a dance in such a way that each gentleman is equally 
likely to dance with any of the M wives, what is the probability that 
exactly m gentlemen will dance with their own wives? Second variation: 
if М soldiers who sleep in the same barracks arrive home one evening so 
drunk that each soldier chooses at random a bed in wh 
is the probability that exactl 
Third variation: 


ich to sleep, what 
y m soldiers will sleep in their own beds? 
if M letters and M corresponding envelopes are typed by a 
tipsy typist and the letters are put into the envelopes in such a way that 
each envelope contains just one letter that is equally likely to be any one 
of the M letters, what is the probability that exactly m letters will be 
inserted into their corresponding envelopes? Fourth variation: If two 
similar decks of M cards (numbered 1 to M) are shuffled and dealt 
Simultaneously, one card from each deck at a time, what is the probability 
that on just m occasions the two cards dealt wil! bear the same number? 

There is a considerable literature on the matching problem that has 
particularly interested psychologists. The reader may consult papers by 
D. E. Barton, Journal of the Royal Statistical Society, Vol. 20 (1958), 
pp. 73-92, and P. E. Vernon, Psychological Bulletin, Vol. 33 (1936), pp. 
149-77, which give many references. Other references may be found in 


an editorial note in the American Mathematical Monthly, Vol. 53 (1946), 
p.107. The matching problem was 


Writers on probability theory. 
statement of the matching problem given by De Moivre (Doctrine of 
Chances, 1714, Problem 35): 


distribution of the balls a 


mong the urns, write 
5%... , Za) Whose jth com 


+, M the event A, that a 


A, = ((z,, 25,0... 2): 
z= k}. Itis clear that for any integer r = ], 2,..., M and any r unequal 
Integers ky, ka... »k,, 1 to M, 

(6.7) PRA Aye Mn 
M! 
It then follows that the sum S,, defined by (6.1), is given by 
(6.8) S es (7) UN = rji ad 
d M! ri 


SEC. 6 OCCURRENCE OF A GIVEN NUMBER OF EVENTS 79 


Equations (6.5) and (6.6) now follow immediately from (6.8), (6.3), and 
(6.2). <q 


œ Example 6B. Coupon collecting (case of sampling with replacement). 
Suppose that a manufacturer gives away in packages of his product certain 
items (which we take to be coupons, each bearing one of the integers 1 to 
M) in such a way that each of the M items available is equally likely to be 
found in any package purchased. If packages are bought, show that the 
probability that exactly m of the M integers, 1 to M, will not be obtained 
(or, equivalently, that exactly M — m of the integers, 1 to M, will be 
obtained) is equal to 

M AM mon 
(6) ml M" 


where we define, for any integer 7, and r = 0,1,...,7, 
(6.10) Aq" = 2 xc. Je = 


The symbol A is used with the meaning assigned to it in the calculus of 
finite differences as an operator defined by Af (x) = f(x + 1) — f(x). We 
write A'(0") to mean the value at = 0 of A'(2"). 

A table of A'(0") forn = 2(1)25 and r = 2(1)n is to be found in Statistical 
Tables for Agricultural, Biological, and Medical Research (1953), Table 
XXII. 

The problem of coupon collecting has many variations and practical 
applications. First variation (the occupancy problem): if distinguishable 
balls are distributed among M urns, numbered 1 to M, what is the proba- 
bility that there will be exactly m urns in which no ball was placed (that is, 
exactly m urns remain empty after the 7 balls have been distributed)? 
Second variation (measuring the intensity of cosmic radiation): if M 
counters are exposed to a cosmic ray shower and are hit by п rays, what is 
the probability that precisely M — m counters will go off? Third variation 
(genetics): if each mouse in a litter of n mice can hs classified as belonging 
to any one of M genotypes, what is the probability that M — m genotypes 
will be represented among the т mice? 

Solution: To describe the coupons found in the л packages purchased, 
We write an n-tuple (2, zs + - -> Zn) whose jth component z, represents the 
number of the coupon found in the jth package purchased. We now define 
events 41, 45, Ag. For k= 1, 2,..., M, A, is the event that the 


number k will not appear in the sample; in symbols, 


(6.11) Ay, = (ecc forj= l2, -,mz xk). 


80 BASIC PROBABILITY THEORY CH. 2 


It is easy to obtain the probability of the intersection of any number of 
the events A,,..., 43у. We have 


M — 1v TU 
ры = (. 7 ) - (1-2), k=1,-+-,M, 


koc k nme, ke artum 
The quantities S,, defined by (6.1), are then given by 


M „\п 
(6.13) 5, = ( — 0,1,6 М 
LEAN M, 


Let B,, be the event that exactly m of the integers 1 to M will not be found 
in the sample. Clearly B,, is the event that exactly т of the events Ау, 
Ay ***, Ayr will occur. By (6.2) and (6.13), 


(614) — P[B]- Y Мін! E i 


r=m Р 


_ [M\ Atom (М — " ( m+ 5 " 
= (%) P» UN k ies : 


0 М 


Which coincides with (6.9). 


Other applications of the theorem stated at the beginning of this section 


‚ “A Unified Derivation, of Some 
Interest in Biometry and Statis- 


| ety, Series A, Уо]. 118 (1955), 
рр. 389-404 (including discussion). / 


* The remainder of this section may be omitted in a first reading of the book. 


dt 


5 


SEC. 6 OCCURRENCE OF A GIVEN NUMBER OF EVENTS 81 


on events in terms of arithmetic operations. Given an event А, оп a sample 
description space S, we define its indicator function, denoted by /(4), as a 
function defined on S, with value at any description s, denoted by /(А; s), 
equal to 1 or 0, depending on whether the description s does or does not 
belong to А. 

The two basic properties of indicator functions, which enable us to 
operate with them, are the following. 

First, a product of indicator functions can always be replaced by a single 
indicator function; more precisely, for any events Ay, Ás, ... , An, 


(6.15) КА)КА») +++ An) = Ilis Аз), 


so that the product of the indicator functions of the sets Ау, 45, . . . , 4, is 
equal to the indicator function ofthe intersection A}, 45, ..., An. Equation 
(6.15) is an equation involving functions; strictly speaking, it is a brief 
method of expressing the following family of equations: for every 
description s 

Ау: КА»: 5) *** Ans 8) = FG +++ Ans 5), 


To prove (6.15), one need note only that /(4,45 . .. An; 5) = Oif and only 
if s does not belong to 4,45... 4,. This is so if and only if, for some 


J= 1,.... п, 5 does not belong to A,, which is equivalent to, for some 
ye licah KA; s) = 0, which is equivalent to the product /(А\; 5) * * * 
I(A,; 5) = 0. 


Second, a sum of indicator functions can in certain circumstances be 
replaced by a single indicator function; more precisely, if the events Ау, 
Ay,..., A, are mutually exclusive, then 


(616) KA) + KA9 +" + (А) = KA, УА U *** U А). 


The proof of (6.16) is left to the reader. One case, in which л = 2 and 
Ay = Ау, is of especial importance. Then 4, U Ay = S, and I(A, U А») 
is identically equal to 1. Consequently, we have, for any event A, 


(6.17) КА) + K4)91 КА) = 1- KA). 


From (6.15) to (6.17) we may derive expressions for the indicator 
functions of various events. For example, let Æ and B be any two events. 


Then 


` (6.18) KAU B)=1— KA*B*) 


= 1 — КА°)(В°) 
1 — (1 — KA)(1 — I(B)) 
КА) + (B) — KAB) 


ll 


i] 


82 BASIC PROBABILITY THEORY CH. 2 


Our ability to write expressions in the manner of (6.18) for the indicator 
functions of compound events, in terms of the indicator functions of the 
events of which they are composed, derives its importance from the following 
fact: an equation involving only sums and differences (but not products) of 
indicator functions leads immediately to a corresponding equation involving 
probabilities; this relation is obtained by replacing I(-) by PĮ]. For example, 
if one makes this replacement in (6.18), one obtains the well-known 
formula P[A U B] = P[A] + P[B] — P[AB]. 

The principle just enunciated is a special case of the additivity property of 
the operation of taking the expected value of a function defined on a 
probability space. This is discussed in Chapter 8, but we shall sketch a 
proof here of the principle stated. We prove a somewhat more general 
assertion. Let f (-) be a function defined on a sample description space. 
Suppose that the possible values of fare integers from — N( f) to N( f) for 
some integer N(f ) that will depend on f (+). We may then represent f (+) as 
a linear combination of indicator functions: 


ХО) 
(6.19) SO= X Юр), 
k=-N(f) 


in which D,(f) = (s: f(s) = k} is the set of descriptions at which /(:) 
takes the value k. Define the expected value of fC), denoted by E[/(:)]: 


(6.20) EIO = S kP[D,(f)]. 
R=—N(f) 


In words, E[f(-)] is equal to the sum, over all possible values k of the 
function /("), of the product of the value К and the probability that /(-) 
will assume the value k. In particular, if f(-) is an indicator function, so 
that f(-) = (A) for some set A, then E[fC)) = P[A]. Consider now 
another function g(-), which may be written 

N) 


(6.21) 80 = У JIDA) 
ј= Хр) 
We now prove the basic additivity theorem that 
(6.22) ЕГО) + 80) = ELAO + Elgo). 


The sum f(-) + gC) of the two functions is a function whose possible 
values are numbers, — N to N, in which N = N( f) + N(g). However, we 
may represent the function f(-) + g(-) in terms of the indicator functions 
I[D,(f)] and 7[D;(g)]: 


ХОР) Хор) 


0) +80 = X >, К, t DIDAS) DAO. 
0. 


к= МЈ) ј= № 


SEC. 6 OCCURRENCE OF A GIVEN NUMBER OF EVENTS 83 


Therefore, 


ELS) + 01 = * 8 kd E + РРР) 
NG) N(g) 
= 3 Б An PIDA) De) 


Хо) 


if = PUDE 20/8) 


оу k= 
XU) 
= У KP[D() +, x. „EPEN 
к= 20) j=] 
= E[fC)] + Elg), 


and the proof of (6.22) is complete. By mathematical induction we deter- 
mine from (6.22) that, for any n functions, Ai), RO) . . AQ) 
(6.23) ЕЛ) AOL = EAO + +++ + EAO. 
Finally, from (6.23), and the fact that E[/(4)] = P[A], we obtain the 
principle we set out to prove; namely, that, for any events A, Ay, .. . , Ans 
if 
(6.24) КА) = eI) + ФКА) + + с, А), 
in which the с, are either +1 or —1, then 
(6.25) P[A] = с,Р[Ау] + Р + °° + РА. 
We now apply the foregoing considerations to derive (6.2). The event 
By, that exactly m of the M events Ay, Ag, ..., Ау Occur, may be 
expressed as the union, over all subsets = zh... Liu) of size m of the 
integers {1, 2,..., M], of the events 4...4, 4...4; there are 


m such events, and they are mutually exclusive. Consequently, by 


(6.15)-(6.17), 
(626)  (B,) = УКА.) NAM — IK, 01: t КА, 
Im 


Now each term in (6.26) may be written 
(627) A.) +++ КА) — HOS) +77 + СН) 
Tort Bag as 
. A; Js in which the summation is over 


where we define H,(J,,) = (А, 
-> iy). Now because of 


all subsets of size К of the set of integers 3 NEN P 
Symmetry and (6.15), one sees that 
m+k 


(6.28) XKAQ:::K4;, )H, ›= ( ar ) Hes 
In 


sieq 2jgeusim3unstp yy Sururiuoo usn uv шозу и 2215 Jo sojdures Surae1q 


сї 
1иәшәзгдәз oy A, 1uourooejdo1 yA, 
д – 24 
о ѕәјашеѕ рәләр2о di di 
40 posopsoun зәция ѕәјашеѕ рәзәрзоц{у sajdures рәзәр:о 
W 
( u ) G Tuc ) ^ 4 
1—ud = 
(у= i * Sa "X E ( ута X2 Е ш "да 3 KS "(9- EA N го 
N= Р AS Е S - = Wis 
1—"4-N —N/NN u— IT 212A ojdurs oy) ut = шуу 5 М 
СТ (") л) ( "T W ) poureiuoo 10u әзе 51109 эзәцм Aldus oq [цм sun AI 
{ЖЕЗ х poyroods N ЈО ш Apexa poyroods о ш APU: 
"= mW ( wg TM G bi RES 
м ( (Qu —N)—u ) ч) u=y 
E = Qu — N) - uu и (ү Clr а-а кё 
Кт Ww 
ИЕЫ 
m “uj. (NH 
мгам Е H > 
* . j-th " " y W > М эзәцм ‘dues WSN 
ix] = (2 == )( пэ Е" Əy ш poureiuoo st sijeq эзәцм 'pordnooo 9q [им TIL 
d ШТ Did М-и ) " N pogrods N Jo ywq | sum payloads N Jo Чот 
E "a-m "i yl) 3 I—-uRN-HN 
© 
FA и= Wy + way + 
m ‚+ Fy + by әләцм Y : 
ике еа 0 ( E) ( s ) HS. п ‘sou My sieadde neq + E TENA sieq 
9 е he MN ESTES Ky.. yty цару OQ): isoum Fy pedi da 
Ж PME rem й savadde peq puooas оці : па y sum — qr 
vi 1 {жэй Ту gieadde eq 7uo» uan puoas :=цва 
E 151g ayp “ardues oy Uy у sumiuoo uin 1514 
€—€———— | 
( x) и ) id 
_ M и 5 y әләцм ‘dwers иу 
TET WW CE. эці ur вәшп y avodde эзәцм цв у E 1 
E = “) ( you ) leq pouroods y -002 Шм uin payloads y 
I-A t—3-uw 
eem um dis ce NNI 
wəjqosd Suyjdueg uio[qoad A»uednoo 
speq ojqeusmsunsrpur 'T Suns o 
m VERD i ОПА siteq opqeusinsunsipug siteq orqeusinsunsrq 
wy Хипдедозй our E 


uorsnpoxo MOUNA 


uorsnjoxo TAA 


БУ suan o[qeusinsunsip Jy. MU speq и Зиппашва 


SKIOJ 5NFIdNVS аму AONVdN990 "IVIIN3E) яні NO ЗМІЯ) SLNIAJ SQOBIVA 10 savad IHL vo "IS8VIL 


SEC. 


6 OCCURRENCE OF A GIVEN NUMBER OF EVENTS 85 


Hy, = XKA, ... Аз). in which the summation is over all subsets of 
size k + m of the set of integers (1,2,..., M). To see (6.28), note that 


/M—m 


M i ix 
there are (5) terms in Jm, ( k ) terms in H,(J,), and * 4 


А M\(M—m M 
terms H, а ( s£: = 
in //,,, and use the fact that m k © me 


m+ k 
em J 


Finally, from (6.24) to (6.28) and (6.1) we obtain 


(6.29) мв. = S cwy(" t s... 
k=0 


n 


Which is the same as (6.2). Equation (6.4) follows immediately from (6.2) 
by induction. 


6.1. 


6.2, 


6.3, 


6.4, 


THEORETICAL EXERCISES 


Matching problem (case of sampling without replacement). Show that for 
/ =1,...,M the conditional probability of a match in the jth urn, given 
that there are m matches, is /M. Show, for any 2 unequal integers j and 
k, 1 to M, that the conditional probability that the ball number j was 
placed in urn number k, given that there are т matches, is equal to 
(M — m)|M(M — 1). 

Matching (case of sampling with replacement). Consider the matching 


problem under the assumption that, for j = 1, ds veces My the ball inserted 
in the jth urn was chosen randomly from all the M balls available (and then 


made available as a candidate for insertion in the (j + 1)st urn). Show that 
the probability of at least 1 match is 1 — [I = (ПМ) 1 ет = 
0.63212. Find the probability of exactly m matches. 


А man addresses л envelopes and writes п checks in payment of z bills. 


(i) If the л bills are placed at random in the л envelopes, show that the 
probability that each bill will be placed in the wrong envelope is 


* caua. 
к=? 


(ii) If the n bills and л checks are placed at random in the п envelopes, 1 in 
each envelope, show that the probability that in no instance will the 
n 


enclosures be completely correct is У (= 1) — А) 1/0011). 
K=0 


obability that each bill and each check will be in a 


(iii) In part (ii) the рг t 
wrong Eure is "pad to the square of the answer to part (i). 


coupon collecting) problem. Consider an urn that contains 
iven integers r and M. Suppose that for each integer is 
bear the integer j. Find the probability that in a 


A sampling (or 
rM balls, for gi 
lto M, ‘exactly r balls 


86 


6.5. 
6.6. 
6.7. 
6.8. 


6.1. 


6.2 


6.3. 


6.4. 


6.5. 


BASIC PROBABILITY THEORY CH. 2 


sample of size л (in which n > M), drawn without replacement from the 
urn, exactly m of the integers 1 to M will be missing. 


Hint: 5, = ("им — рыем), 
Verify the formulas іп row I of Table 6A. 

Verify the formulas in row II of Table 6A. 

Verify the formulas in row III of Table 6A. 

Verify the formulas in row IV of Table 6A, 


EXERCISES 


If 10 indistinguishable balls are distributed among 7 urns in such a way 
that all arrangements are equally likely, what is the probability that (i) a 


Specified urn will contain 3 balls, (ii) all urns will be occupied, (iii) exactly 
5 urns will be empty? 


If 7 indistinguishable balls are distribute 


that not more than 1 ball may be put in any urn and all such arrangements 
are equally likely, what is the probability that (i) a specified urn will contain 
1 ball (ii) exactly 3 of the first 4 urns will be empty? 


If 10 distinguishable balls are distributed among 4 urns in such a way that 
all arrangements are equally likely, what is the probability that (i) a specified 
urn will contain 6 balls, (ii) the first urn will contain 4 balls, the second 
urn will contain 3 balls, the third urn will contain 2 balls, and the fourth 
urn will contain 1 ball, (iii) all urns will be occupied ? 


d among 10 urns in such a way 


Consider 5 families, each consisting of 4 persons. If it is reported that 6 
of the 20 individuals in these families have a contagious disease, what is the 
probability that (i) exactly 2, (ii) at least 3 of the families will be quarantined ? 


Write out (6.2) and (6.4) for (i) M —2 and m = 0,1,2, (ii) M =3 and 
m = 0, 1, 2,3, (iii) M =4 and m =0 1; 2,3; 4. 


CHAPTER 3 


Independence 


and Dependence 


In this chapter we show how to treat probability problems involving 
finite sample description spaces, in which the descriptions are not necessarily 
equally likely, by using the notions of independent and dependent events 
and trials. 


1. INDEPENDENT EVENTS AND FAMILIES OF EVENTS 


The notions of independent and dependent events play a central role in 
Probability theory. Certain relations, which recur again and again in 
Probability problems, may be given a general formulation in terms of these 
Notions, If the events A and B have the property that the conditional 
Probability of В, given А, is equal to the unconditional probability of B, 
one intuitively feels that the event is statistically independent of A, in the 
Sense that the probability of B having occurred is not affected by the 
knowledge that А has occurred. We are thus led to the following formal 
definition. 

DEFINITION OF AN EVENT В BEING INDEPENDENT OF AN EVENT A WHICH 
HAS POSITIVE PROBABILITY. Let 4 and В be events defined on the same 
Probability space 5. Assume P[4] > 0, so that P[B | А] is well defined. 

87 


88 INDEPENDENCE AND DEPENDENCE CH.3 


The event В is said to be independent (or statistically independent) of the 
event A if the conditional probability of В, given A, is equal to the uncon- 
ditional probability of B; in symbols, В is independent of A if 


(1.1) P[B | A] = Р[В]. 


Now suppose that both А and B have positive probability. Then both 
P[A | B] and P[B | A] are well defined, and from (4.6) of Chapter 2 it 
follows that 


(1.2) P[AB] = PLB | А]Р[А] = Р[А | В]Р[В]. 


If B is independent of A, it then follows that A is independent of В, since 


from (1.1) and (1.2) it follows that P[A | B] = P[A]. It further follows 
from (1.1) and (1.2) that 


(1.3) / P[AB] = Р[А]Р[В]. 


By means of (1.3), a definition may be given of two events being indepen- 
dent, in which the two events play a symmetrical role. 


DEFINITION OF INDEPENDENT EVENTS. Let А and B be events defined on 


the same probability Space. The events A and Bare said to be independent 
if (1.3) holds. 


> Example 1A. Consider the problem of drawing with replacement a 
sample of size 2 from an urn containing four white and two red balls, 
Let A denote the event that the first ball drawn is white and B, the event 
that the second ball drawn is white. By (2.5), in Chapter 2, P[AB] = (4)°, 


Whereas Р[А] = P[B] — 4. In view of (1.3), the events А and B are 
independent, 


< 
Two events that do not satisfy (1.3) are said to be dependent (although a 
more precise terminology woul 


d be nonindependent). Clearly, to say that 
two events are dependent is not very informative, for two events, А and В, 
are dependent if and only if Р[АВ] 3+ Р[А]Р[В], However, it is possible to 
classify dependent events to a certain extent, and this is done later. (See 
section 5.) 


It should be noted that two mut 
independent if and only if PLA]P[B] 
or B has probability zero. 


ually exclusive events, A and B, are 
= 0, which is so if and only if either A 


p> Example 1B. Mutually exclusive events. 

drawn from an urn containing six balls, of w 
denote the event that exactly one of the balls drawn is white, and let D 
denote the event that both balls drawn are white. The events C and D are 


mutually exclusive and are not independent, whether the sample is drawn 
with or without replacement. 4 


Let a sample of size 2 be 
hich four are white, Let C 


SEC. 1 INDEPENDENT EVENTS AND FAMILIES OF EVENTS 89 


D» Example 1C. A paradox? Choose a summer day at random on which 
both the Dodgers and the Giants are playing baseball games. Let А be 
the event that the Dodgers win, and let B be the event that the Giants win. 
If the Dodgers and the Giants are not playing each other, then we may 
consider the events А and В as independent but not mutually exclusive. 
If the Giants and the Dodgers are playing each other, then we may 
consider the events А and В as mutually exclusive but not independent. 
To resolve this paradox, one need note only that the probability space on 
Which the events А and В are defined is not the same in the two cases. 
(See example 2B.) 4 

The notions of independent events and of conditional probability may 
be extended to more than two events. Suppose one has three events А, B, 
and C defined on a probability space. What are we to mean by the con- 
ditional probability of the event C, given that the events А and B have 
occurred, denoted by P[C | 4, B]? From the point of view of the frequency 
interpretation of probability, by P[C| А, B] we mean the fraction of 
Occurrences of both А and В on which C also occurs. Consequently, we 


make the formal definition that 
PIC diras ЖЇЗЇ 
(1.4) P[C | A, B] = PIC | = аа] 


if P[AB] > 0; P[C | A, B] is undefined if P[AB] = 0. 

Next, what do we mean by the statement that the event C is independent 
of the events А and B? It would seem that we should mean that the 
conditional probability of C, given either A or B or the intersection AB, 
is equal to the unconditional probability of C. We therefore make the 


following formal definition. 
The events A, B, and C, defined on the same probability space, are said to 


be independent (or statistically independent) if 
(1.5) P[AB] = P[4]P[B] PIAC] = P[A]PIC, РВС] = P[B]P[C], 
(1.6) P[ABC] = P[A]P[B]P|[C]. 

If (1.5) and (1.6) hold, it then follows that (assuming that the events A, 
B, C, AB, AC, BC have positive probability, so that the conditional 
Probabilities written below are well defined) 

P[A | B, C] = Р[А | B] = PIA | C] = PIA] 
(1.7) P[B| A, C] = P[B | A] = РІВ | C] = PL] 

P[C| A, B] = РІС| A] = PIC | B] = PIC] 
Conversely, if all the relations in (1.7) hold, then all the relations in (1.5) 
and (1.6) hold. 


90 INDEPENDENCE AND DEPENDENCE CH. 3 


It is to be emphasized that (1.5) does not imply (1.6), so that three 
events, А, B, and C, which are pairwise independent [in the sense that (1.5) 
holds], are not necessarily independent. To see this, consider the following 
example. 


p> Example 1D. Pairwise independent events that are not independent. 
Let a ball be drawn from an urn containing four balls, numbered 1 to 4. 
Assume that S = (1, 2, 3, 4) possesses equally likely descriptions. The 
events A = (1,2), B = (1,3), and C= (1. 4} satisfy (1.5) but do not 
satisfy (1.6). Indeed, P[C | A, B] = 1 5 } = P[C] = P[C| 4] = P[C | B]. 
The reader may find it illuminating to explain in words why P[C | A, B] = 
1 


> Example 1E. The joint credibility of witnesses. Consider an automobile 
accident on a city street in which car I stops suddenly and is hit from behind 
by car П. Suppose that three persons, whom we call A', B', and C', 
witness the accident. Suppose the probability that each witness has 
correctly observed that car I stopped suddenly is estimated by having the 
Witnesses observe a number of contrived incidents about which each is 
then questioned. Assume that it is found that A’ has probability 0.9 of 
stating that car I stopped suddenly, B' has probability 0.8 of stating that 
car I stopped suddenly, and C' has probability 0.7 of stating that car I 
Stopped suddenly. Let А, B, and C denote, respectively, the events that 
persons A’, B’, and C’ will state that car I stopped suddenly. Assuming 
that А, B, and C are independent events, what is the probability that (i) A’, 
B', and C' will state that car I stopped suddenly, (ii) exactly two of them 
will state that car I Stopped suddenly? 

Solution: Ву independence, the probability P[ABC] that all three 
witnesses will state that car I stopped suddenly is given by P[ABC] = 
P[A]PLBIP[C] = (0.9)(0.8)(0.7) = 0.504. It is subsequently shown that if 
A, B, and C are independent events then A, B, and C* are independent 
events. Consequently, the probability that exactly two of the witnesses will 
state that car I stopped suddenly is given by 


Р[АВС U ABC U АВС] 


= Р[АЈ]Р[В]Р[С] + P[A]PLBAP[C] + Р[А°]Р[В]Р[С] 


= (0.9)(0.8)(0.3) + (0.9)(0.2)0.7) + (0.1 )(0.8)(0.7) 
= 0.398. 


Тһе probability that at least two of the witnesses will state that car I 
Stopped suddenly is 0,504 + 0.398 = 0.902. It should be noted that the 
sample description Space S on which the events A, B, and C are defined is 
the space of 3-tuples (21, 25, 23) in which % is equal to “tyes” or “по,” 


SEC. 1 INDEPENDENT EVENTS AND FAMILIES OF EVENTS 91 


depending on whether person A’ says that car I did or did not stop suddenly; 
components z and 2; are defined similarly with respect to persons В' and 
E 4 

We next define the notions of independence and of conditional proba- 


bility for n events A}, Ao,.-- As. 
We define the conditional probability of 4,, given that the events 4j, 


A», .. . , A, have occurred, denoted by P[A, | Ay, As .. +, Anal 
(1.8) P[A, | Ap 4 ^^^, Aja] = PIA, | 41427 ** 4,1] 
= P[4145 d A,] 
P[A,Ag*** Ana] 
if P[A Aa" ++ 454] > 0. 

We define the events 4j, As ..., An as independent (or statistically 
independent) if for every choice of k integers jj < is <... < iy from 1 ton 
(1.9) PLA, Ai, *** Ad] = PIASPIA РІА, 

Equation (1.9) implies that for any choice of integers i, <i <...< h 
from 1 to n (for which the following conditional probability is defined) 
and for any integer j from 1 to л not equal to ij, i» . . . , i, one has 
(1.10) PIA, | Ai Aip Aid = РІА. 

We next consider families of independent events, for independent events 
never occur alone. Let sZ and 9 be two families of events; that is, 27 and 
B are sets whose members are events on some sample description space S. 
Two families of events 2 and 8 are said to be independent if any two events 
A'and B, selected from A and 2, respectively, are independent. More 
generally, n families of events (A. Lo ++ -> £ „) are said to be independent 
if any set of n events Ay, As 7 ^7 + Аһ (where Аз is selected from s, Аз is 
Selected from >, and so on, until A, is selected from ««£,) is independent, 
in the sense that it satisfies the relation 
(1.11) P[4,45 + + * A] = PLA VP [49] : * * PIA. 

As an illustration of the fact that independent events occur in families, 
let us consider two independent events, A and В, which are defined on a 
Sample description space S. Define the families . and Z by 


(1.12) of ={A,A,S,0}, 2 = (8, B°, S, 0}, 
so that Z consists of A, its complement 4°, the certain event S, and the 
impossible event 0, and, similarly, 2 consists of B, B°, S, and 0. 

We now show that if the events A and B are independent then the families 
of events of and B defined by (1.12) are independent. In order to prove this 
assertion, we must verify the validity of (1.11) with л = 2 for each pair of 


92 INDEPENDENCE AND DEPENDENCE cH. 3 


from each family, that may be chosen. Since each family has 
WD ere there are teda such pairs. We verify (1.11) for only four 
of these pairs, namely (А, B), (А, B^), (А, 5), апа (A, 0), and leave to the 
reader the verification of (1.11) for the remaining twelve pairs. We have 
that A and B satisfy (1.11) by hypothesis. Next, we show that A and Be 
satisfy (1.11). By (5.2) of Chapter 1, P[AB‘] = P[A] — P[AB]. Since, 
by hypothesis, P[AB] = P[A]P[B], it follows that 


P[AB‘] = P[AY1 — Р[В]) = Р[А]Р[В“], 


for by (5.3) of Chapter 1 Р[В = 1 — Р[ВІ. Next, A and S satisfy (1.11), 
since AS = A and P[S] = 1, so that P[AS] = P[4] = P[A]P[S]. Next, A 
and 0 satisfy (1.11), since 40 = 0 and P[0] = 0, so that P[40] = Р[0] = 
Р[А]Р[0] = 0. 

More generally, by the same considerations, we may prove the following 
important theorem, which expresses (1.9) in a very concise form. 


THEOREM. Let A,, 4o,..., А, ben events ona probability space. The 
events 41, 45, ... , A, are independent if and only if the families of events 


4 = (A, Ay’, S, 0), Ay = (4s, A,’, S, 0), Рр: = (4, А, 5, 0} 
are independent. 


THEORETICAL EXERCISES 


1.1. Consider n independent events Aj, A2... , An. Show that 


Pld, UA U--- UA] 21— P[Ay*]P[As] > ++ PLA,7]. 
Consequently, Obtain the 


probability that in 6 independent tosses of a fair 
die the number 3 will ap 


pear at least once. Answer: 1 — (5/6)8. 


1.2. Let the events А, А, ..., А, be independent and Р[А = p; for 


і —L...,n. Let Py be the probability that none of the events will occur. 
Show that Pa = (L =p) —po3::-(l — Pa): 


13. Let the events Ay, A, ..., An be independent and have equal probability 
P[4] = p. Show that th 


з е probability that exactly k of the events will 
occur is 

n ЗРЗЕ 
(1.13) () pig, 


Hint: PIA, +- Ars A e А] = рур, 


1.4. The multiplicative rule for the 
АА ач An. Show that, for 


Р[АЏААз - - - АД = 


probability of the intersection of n events 
nevents for which P[A, Ag... А112 0, 


PIAJPIA; | АРГА» | Ay, Ag] - SPA dicus ha Д1 


. 


1.6. 


1.1. 


1.2. 


1.3. 


1.4. 


1.5. 


1.6. 


1.7. 


ET INDEPENDENT EVENTS AND FAMILIES OF EVENTS 93 
1.5. 


Let 4 and B be independent events. In terms of P[A] and P[B], express, 
for k = 0, 1, 2, (i) P[exactly k of the events А and В will occur], (ii) Plat 
least k of the events A and В will occur], (iii) P[at most k of the events А 
and B will occur]. 


Let A, B, and C be independent events. In terms of P[A], P[5], and P[C], 
express, for k = 0, 1, 2, 3, (i) P[exactly К of the events А, B, C will occur], 
(ii) Plat least k of the events А, B, C will occur], (iii) P[at most k of the 
events A, B, C will occur]. 


EXERCISES 


Let a sample of size 4 be drawn with replacement (without replacement) 
from an urn containing 6 balls, of which 4 are white. Let 4 denote the 
event that the ball drawn on the first draw is white, and let В denote the 
event that the ball drawn on the fourth draw is white. Are A and В 


independent? Prove your answers. 


Let a sample of size 4 be drawn with replacement (without replacement) 
from an urn containing 6 balls, of which 4 are white. Let 4 denote the 
event that exactly 1 of the balls drawn on the first 2 draws is white. Let 
B be the event that the ball drawn on the fourth draw is white. Are А 


and B independent? Prove your answers. 
(Continuation of 1.2). Let А and В be as defined in exercise 1.2. Let C 
be the event that exactly 2 white balls are drawn in the 4 draws. Are 
A, B, and Cindependent? Are Band Cindependent? Prove your answers. 
probability that (i) both 4’ and B’ will 


(ii) neither A’ nor C’ will state that car I 
and C’ will state that car I 


Consider example 1D. Find the 
state that car I stopped suddenly, е 
Stopped suddenly, (iii) at least 1 of A', B', 
Stopped suddenly. 


A manufacturer of sports cars enters 3 drivers ina race. Let 4, be the 
event that driver 1 "shows" (that is, he is among the first 3 drivers in the 


ish li let 4, be the event that driver 2 shows, and 
wee ded om id 3 shows. Assume that the events 41, 45, Ag 
are independent and that Р[А\] = P[A:] = P[A3] = 0.1. Compute the 
probability that (i) none of the drivers will show, (ii) at least 1 will show, 
(iii) at least 2 will show, (iv) all of them will show. 
Compute the probabilities asked for in exercise 1.5 under the assumption 
that P[4,] = 0.1, Р[А„] = 0.2, Р[Аз] = 0.3. 


enters л drivers ina race. Fori = 1,...,л 
h driver shows (see exercise 1.5). Assume 
dependent and have equal probability 
y that exactly & of the drivers will 


А manufacturer of sports cars 
let A; be the event that the ith c 
that the events 4,,..., 4» are Inde 
РАД] = p. Show that the probabilit 


Show is (7) p 


94 INDEPENDENCE AND DEPENDENCE cH. 3 


1.8. Suppose you have to choose a team of 3 persons to enter a race. The rules 
of the race are that a team must consist of 3 people whose respective pro- 
babilities ру, po, рз of showing must add up to 3; that is, py + Po + ра = 4. 
What probabilities of showing would you desire the members of your team 
to have in order to maximize the probability that at least 1 member of 
your team will show? (Assume independence.) 

1.9. Let А and B be 2 independent events such that the probability is 4 that 
fhey will occur simultaneously and } that neither of them will occur. Find 
P[A] and P[B]; are P[A] and Р[В] uniquely determined? 


1.10. Let A and B be 2 independent events such that the probability is ġ that they 
will occur simultaneously and } that A will occur and В will not occur. 
Find Р[А] and P[B]; are P[A] and P[B] uniquely determined? 


2. INDEPENDENT TRIALS 


The notion of independent families of events leads us next to the notion 
of independent trials. Let S be a sample description space of a random 
observation or experiment on which is defined a probability function P[-]. 
Suppose further that each description in S is an n-tuple. Then the random 
phenomenon which S describes is defined as consisting of n trials. For 
example, suppose one is drawing a sample of size n from an urn containing 
M balls. The sample description space of such an experiment consists of 
n-tuples. It is also useful to regard this experiment as a series of trials, in 
each of which a ball is drawn from the urn. Mathematically, the fact that 
in drawing a sample of size n one is performing п trials is expressed by the 
fact that the sample description space S consists of n-tuples (2;, 
the first component z, Tepresents the outcome of the first trial, 
component z, represents the outcome of the sec 
2, represents the outcome of the nth trial. 

We next define the im 
S be a sample descri 


gs жесе Pa 
the second 
ond trial, and so on, until 


an integer, 1 ton. We say that А depends on the kth 
of А depends only on the outcome 
in order to determine whether or not A 


of any trial. 


SEC. 2 INDEPENDENT TRIALS 95 


p» Example 2A. Suppose one is drawing a sample of size 2 from an urn 
containing white and black balls. The event 4 that the first ball drawn is 
white depends on the first trial. Similarly, the event B that the second ball 
drawn is white depends on the second trial. However, the event C that 
exactly one of the balls drawn is white does лог depend on any one trial. 
Note that one may express C in terms of 4 and B by C = ABS UAB. q 


p» Example 2B. Choose a summer day at random on which both the 
Dodgers and the Giants are playing baseball games, but not with one 
another. Letz, = 1 or 0, depending on whether the Dodgers win or lose 
their game, and, similarly, let z; = 1 or 0, depending on whether the Giants 
win or lose their game. The event A that the Dodgers win depends on the 
first trial of the sample description space S = Ieez 4 = Тог, з= 1 
or 0}. 4 


We next define the very important notion of independent trials. Consider 
а sample description space S consisting of n trials. For k = 1, 2. vus 
let Z, be the family of events on 5 that depends on the kth trial. We 
define the n trials as independent (and we say that S consists of n independent 
trials) if the families of events Mig Ay,..., Aare independent. Otherwise, 
the л trials are said to be dependent or nonindependent. More explicitly, 


the л trials are said to be independent if (1.11) holds for every set of events 
.. 1, A, depends only on the 


Ay, Ay,...,A,, such that, for k = 1, 2.. 


kth trial. » 
If the reader traces through the various definitions that have been made 


in this chapter, it should become clear to him that the mathematical 
definition of the notion of independent trials embodies the intuitive mean- 
ing of the notion, which is that two trials (of the same or different experi- 


ments) are independent if the outcome of one does not affect the outcome of 


the other and are otherwise dependent. ud 
In the foregoing definition of independent trials it was assumed that the 


Probability function P[-] was already defined on the sample description 
Space S, which consists of n-tuples. If this were the case, it is clear that to 


establish that S consists of independent trials requires the verification of a 


large number of relations of the form of (1.11). However, in practice, one 


does not start with a probability function P[] on S and then proceed to 
verify all of the relations of the form of (1.11) in order to show that S 
Consists of n independent trials. Rather, the notion of independent trials 
derives its importance from the fact that it provides am often-used method for 
Setting up a probability function on а sample description space. This is done 
In the following way.* 

* The remainder of this section may be omitted in a first reading of the book if the 
reader is willing to accept intuitively the ideas made precise here. 


96 INDEPENDENCE AND DEPENDENCE CH. 3 


Let 21, Zo - - - , Zn be n sample description spaces (which may be alike) 
on whose subsets, respectively, are defined probability functions P}, 
Р,,...,Р,. For example, suppose we are drawing, with replacement, а 
sample of size n from an urn containing N balls, numbered 1 to N. We 
define (for К = 1,2,...,7) Z, as the sample description space of the 
outcome of the kth draw; consequently, Z, = {1, 2,..., N}. If the 
descriptions in 2„ are assumed to be equally likely, then the probability 
function P; is defined on the events C, of 2, by P,[C,] = N[C,]/ N[Z;]. 

Now suppose we perform in succession the л random experiments whose 
sample description spaces are Z}, Z», ..., Z,, respectively. The sample 
description space S of this series of п random experiments consists of n- 
tuples (21, 25, ..., 2,), which may be formed by taking for the first com- 
ponent z, any member of Z,, by taking for the second component z, any 
member of Z,, and so on, until for the nth component z, we take any 
member of Z,. We introduce a notation to express these facts; we write 

S = 2, © 2,09... Z,, which we read “S is the combinatorial product 
of the spaces 21, Za, . . . , Z,." More generally, we define the notion of a 
combinatorial product event on S. For any events C, оп Z;, C, on Za, and 
C, onZ, we define the combinatorial product event C — CO Cr Oar: O C, 
as the set of all n-tuples (z,, 29, . . . , z,), which can be formed by taking for 
the first component z, any member of C,, for the second component z, any 
member of C,, and so on, until for the nth component z, we take any 
member of C,. 

We now define a probability function P[-] on the subsets of S. For every 


event C on S that is a combinatorial product event, so that C= C, ®© 
С» 6 ... © C, for some events Cy Up 


to Zi, Z4, ..., Zp, we define 
(2.1) 


- +, Cp, which belong, respectively, 


PIC] = Py[C\]PaC,] · - - РС]. 


Not every event in S is a combinatorial product event. However, it can 


be shown that it is possible to define a unique probability function P[:] on 


the events of S in such a way that (2.1) holds for combinatorial product 
events. 


It may help to clarify the meaning of the foregoing ideas if we consider 


the special (but, nevertheless, important) case, in which each sample 
description space Z,,2,...,Z, is finite, of sizes Ny, No, «+2 № 
respectively. As in section 6 of Chapter 1, we list the descriptions in 
Zio: co Zt for j= eee a 


Z, = (DP, DP, ---, рф. 
Now let S — 2 ® 2,9...G Z, be the sample description space of the 
random experiment, which 


consists in performing in succession the л 


SEC. 2 INDEPENDENT TRIALS 97 


random experiments whose sample description spaces аге 21. Zp, . .. , 2% 
respectively. A typical description іп S сап be written (D{?, Df?,. .., Dj?) 
where, for j = 1,.... n, Dj? represents a description in 2; and i; is some 
integer, 1 to Nj. To determine a probability function P[-] on the subsets 
of S, it suflices to specify it on the single-member events of S. Given 
probability functions Pi[-], РС, .... Pal] defined on Zi, 25,..., s 
respectively, we define Р[] on the subsets of 5 by defining 


Q3) PDP, DË,- Dg = Pill DUPLO = Pal DL} 


Equation (2.2) is a special case of (2.1), since а single-member event оп 5 
can be written as a combinatorial product event; indeed, 

озу (DP DP, 009) = (DPO {DR} © e {DM}. 

p> Example 2C. Let Z, = (H. T} be the sample description space of the 
experiment of tossing a coin, and let Zg = {1,2,..., 6} be the sample 
description space of the experiment of throwing a fair die. Let S be the 
sample description space of the experiment, which consists of first tossing 
а coin and then throwing a die. What is the probability that in the jointly 
performed experiment one will obtain heads on the coin toss and a 5 on 
the die toss? The assumption made by (2.2) is that it is equal to the product 
of (i) the probability that the outcome of the coin toss will be heads and 
(ii) the probability that the outcome of the die throw will be a 5. 4 


We now desire to show that the probability space, consisting of the 
7.92,9...® Z,, on whose subsets a 


sample description space S — ] А 
d by means of (2.1), consists of n independent 


probability function P['] is define 
trials, —— 

We first note that an event Ay 
is necessarily a combinatorial produc 
in Zp 


(2.4) Аь=2у@"*^@ 219 «0092877 eZ, 


in S, which depends only on the kth trial, 
t event; indeed, for some event C; 


he fact that an évent А„ depends on the kth 
sion as to whether or not a description 
depends on the kth component z, of the 
_, А„ be events depending, respectively, 
For each A, we have a representation of 
he intersection may be written as a 


Equation (2.4) follows from tl 
trial if and only if the deci 
(2, з,...,г„) belongs to Ar 
description. Next, let А1, 4». - - 
On the first, second, . . . , nth trial. 
the form of (2.4). We next assert that t 
Combinatorial product event: 


(2.5) AyAg*** An = GOCO C,. 


98 INDEPENDENCE AND DEPENDENCE cH. 3 


We leave the verification of (2.5), which requires only a little thought, to 
the reader. Now, from (2.1) and (2.5) 


(2.6) Р[А;А» *** A,] = РСС  : - P,[C,], 
whereas from (2.1) and (2.4) 

(27) Р[А = PAZI i e e PaUZ D PAICP Za) : Р] 
РДС 


ll 


From (2.6) and (2.7) it is seen that (1.11) is satisfied, so that S consists of n 
independent trials. 

The foregoing considerations are not only sufficient to define a proba- 
bility space that consists of independent trials but are also necessary in 
the sense of the following theorem, which we state without proof. Let the 
sample description space S be a combinatorial product of n sample description 
spaces Z,,Z,,...,Z,. Let Р[] be a probability function defined on the 
subsets of S. The probability space S consists of n independent trials if and 
only if there exist probability functions Pp], Po], -... P,[], defined, 
respectively, on the subsets of the sample description spaces 21, Zo, +... Zp 
with respect to which Р[) satisfies (2.6) for every set of n events Ay, Аъ, 
A, on S such that, for К =1,.. 
then C, is defined by (2.4)). 


To illustrate the foregoing considerations, we consider 
example. 


m 


-> n, A, depends only on the kth trial (and 
the following 


> Example 2D. A man tosses two fair coins independently. Let C, be 
the event that the first coin tossed is a head, let C, be the event that the 
second coin tossed is a head, and let C be the event that both coins tossed 
are heads. Consider sample description spaces: S = {(H, H), (H, T). 
(T, Н), (Т, T), Z, = Z, = (H, T). Clearly S is the sample description 
space of the outcome of the two Losses, whereas Z, and Z, are the sample 
description spaces of the outcome of the first and second tosses, respectively. 


We assume that each of these sample description spaces has equally 
likely descriptions. d 

The event C, may be defined on either SorZ, If defined on 21, C, = 
(Hj. If defined on S, C, = {(Н, Н), (Н, T). The event C, may in a 
similar manner be defined on either 2» or S. However, the event C can be 
defined only on 5; C= {(Н, Н). 

The spaces on which C 
exists between Су, C,, an 
C= GC, 
C, © C 


1 and C, are defined determines the relation that 


d C. If both C, and C, are defined on S, then 
If C, and С» are defined on Z, and Z,, respectively, then C — 


SEC. 2 INDEPENDENT TRIALS 99 


In order to speak of the independence of C, and С», we must regard them 
as being defined on the same sample description space. That C, and C, are 
independent events is intuitively clear, since 5 consists of two independent 
trials and C, depends on the first trial, whereas С, depends on the second 
trial. Events can be independent without depending on independent trials. 
For example, consider the event D = ((H, Н). (T. T)) that the two tosses 
have the same outcome. One may verify that D and C, are independent 
and also that D and C, are independent. On the other hand, the events D, 


C,, and C, are not independent. 4 


EXERCISES 


2.1. Consider a man who has made 2 tosses of a die. State whether each of the 
following six statements is true or false. 
Let A, be the event that the outcome of the first throw is a 1 or a 2. 
Statement 1; A, depends on the first throw. 
Let A, be the event that the outcome of the second throw is a 1 or a 2. 
Statement 2: A, and Ag are mutually exclusive events. 
Let B, be the event that the sum of the outcomes is 7. 
Statement 3: Bj depends on the first throw. 
Let B, be the event that the sum of the outcomes is 3. 
Statement 4: B, and B, are mutually exclusive events. 
Let C be the event that one of the outcomes is a 1 and the other is a 2. 
Statement 5: A, U Аз is а subevent of G 
Statement 6: C is a subevent of B». 

2.2. Consider a man who has made 2 tosses ofa coin. He assumes that the 
possible outcomes of the experiment, together with their probability, are 
given by the following table: 


Sample Descriptions D (H, H) (H, T) (T, H) Un amy 


P[(D3] 3 é $ i 
Show that this probability space does not consist of 2 independent trials. 
Is there a unique probability function that must be assigned on the subsets 
of the foregoing sample description space in order that it consist of 2 
independent trials? 

2.3. Consider 3 urns; urn I contains 1 white and 2 black balls, urn II contains 
3 white and 2 black balls, and urn Ш contains 2 white and 3 black balls. 
One ball is drawn from each urn. What is the probability that among the 
balls drawn there will be (i) 1 white and 2 black balls, (ii) at least 2 black 
balls, (iii) more black than white balls? 

2.4. If you had to construct a mathematical model for events 4 and B, as 
described below, would it be appropriate to assume that 4 and B are 
independent? Explain the reasons for your opinion. 


100 INDEPENDENCE AND DEPENDENCE cH. 3 


(1) A is the event that a subscriber to a certain magazine owns a car, and B 
is the event that the same subscriber is listed in the tele 


Gi) A is the event that a married man has 
his wife has blue eyes. 


Gii) A is the event that a man aged 21 is more than 6 feet tall, and B is the 
event that the same man weighs less than 150 pounds. 

(iv) А is the event that a man lives 
the event that he lives in the Wester: 


(v) A is the event that it will rain to 
rain within the next week. 


phone directory. 
blue eyes, and B is the event that 


in the Northern Hemisphere, and B is 
n Hemisphere. 


morrow, and В is the event that it will 


2.5. Explain the meaning of the following statements: 

(i) А random phenomenon consists of n trials. 

(ii) In drawing a sample of size n, one is performing n trials. 
(iii) An event А depends on the third trial, 
(iv) The event that the third ball drawn is white 
(v) In drawing with replacement a $ 
independent trials of an experiment. 
(vi) If S is the sample description space of the ex 


replacement a sample of size 6 from an urn con 
to 10, then S=Z,@Z,@. 
J*1,...,6. 

(vii) If, in (vi), balls numbered 1 to 7 are white and 


balls drawn are white, then 4 — COGO.. 
(,2,...,7) for 2 1,...,6. 


depends on the third trial. 
ample of size 6, one is performing 6 


periment of drawing with 


taining balls, numbered 1 
-+® 2, in which Z; = (1,2,,,., 10} for 


if A is the event that all 
-© Cy, in which С; = 


3. INDEPENDENT BERNOULLI TRIALS 


Many problems in probability theor 
trials of an experiment whose 
categories, called “successes” 


(3.1) 


i ion РУ[-], satisfying P4[ís1] = 
pP. PAL og. ile 
Consider now n independent repeated Bernoulli trials, in which the word 


"repeated" is meant to indicate that the probabilities of success and failure 
remain the same throughout the trials, The sample description space S of 
n independent Tepeated Bernoulli trials contains 2” descriptions, each an 


SEC. 3 INDEPENDENT BERNOULLI TRIALS 101 


n-tuple (21, Zə, . . . , 2,), in which each z; is either an s or anf. The sample 
description space S is finite. However, to specify a probability function 
P[] on the subsets of S, we shall not assume that all descriptions in S are 
equally likely. Rather, we shall use the ideas in section 2. 

In order to specify a probability function P[-] on the subsets of S, it 
Suffices to specify it on the single-member events ((2,, ...,2,)). However, 
а single-member event may be written as a combinatorial product event; 
indeed, {(2,,...,2,)} = {4} ®... © (2,). Since it has been assumed that 
Pzl{s}] = p and Pz[[ f }] = q, we obtain the following basic rule.* 

If a probability space consists of n independent repeated Bernoulli trials, 
then the probability P[{(z, . . . , 2,)}] of any single-member event is equal to 
DP'q"-*. in which k is the number of successes s among the components of the 
description (panga 


> Example 3A. Suppose that a man tosses ten times a possibly unfair 
coin, whose probability of falling heads is p, which may be any number 
between 0 and 1, inclusive, depending on the construction of the coin. On 
each trial a success s is said to have occurred if the coin falls heads. Let 
us find the probability of the event 4 that the coin will fall heads on the 
first four tosses and tails on the last six tosses, assuming that the tosses are 
independent. It is equal to рї, since the event A is the same as the single- 


member event {(s, s, s, s, f, ЛАЛА) 4 


One usually encounters Bernoulli trials by considering a random event 
E, whose probability of occurrence is p. In each trial one is interested only 
in the occurrence or nonoccurrence of Е. A success 5 corresponds to an 
Occurrence of the event Е, and a failure f corresponds to а nonoccurrence 
Of E. Thus, for example, one may be tossing darts at a target, and E may 
be the event that the target is hit; or one may be tossing a pair of dice, and 
Е тау represent the event that the sum of the dice is 7 (for fair dice, р = $); 
9r3 men may be tossing coins simultaneously, and Е may be the event that 
all of the coins fall heads (for fair coins, p = 3); or a woman may be 
Pregnant, and Е is the event that her child is a boy; or a man may be 
Celebrating his 21st birthday, and E may be the event that he will live to be 


22 years old 
The Probability of k Successes in л Independent Repeated Bernoulli 


Trials, Frequently, the only fact about the outcome of a succession of n 


rnoulli trials in which we are interested is the number of successes. We 
that the number of successes will be k, for 


now сог ili 

mpute robability 2 E 
сер "..m. The event “k successes іп n trials" can 
„Ж 


апу integer К from 0, 1,2,... 
* A reader who has omitted the preceding section may take this rule as the definition 


o i : үн 
” independent repeated Bernoulli trials. 


102 INDEPENDENCE AND DEPENDENCE cH. 3 


happen in as many ways as k letters s may be distributed among л places: 
this is the same as the number of subsets of size k that may be formed from 


n a 
a set containing members. Consequently, there are (i) descriptions 


containing exactly К successes and л — К failures. Each such description 
has probability p^g"-*. Thus we have obtained a basic formula. 

The Binomial Law. The probability, denoted by b(k;n,p) that п 
independent repeated Bernoulli trials, with probabilities p for success, and 
q = | — p for failure, will result in k successes and т — k failures (in 
which k = 0, 1,..., n) is given by 


(32) b(k;n,p) = ( p) pa. 


The law expressed by (3.2) is called the binomial law because of the role 
the quantities in (3.2) play in the binomial theorem, which states that 


n 


(3.3) " A (i) pq = (р + д)" = 1, 


k=0 
since p +q = 1. 

The reader should note that (3.2) is very similar to (3.4) of Chapter 2. 
However, (3.2) represents the solution to a probability problem that does 
not involve equally likely descriptions. The importance of this fact is 
illustrated by the following example. Suppose one is throwing darts at a 
target. It is difficult to see how one could compute the probability of the 
event £ that one will hit the target by setting up some appropriate sample 
description space with equally likely descriptions. Rather, p may have to 
be estimated approximately by means of the frequency definition of 
probability. Nevertheless, even though p cannot be computed, once one 
has assumed a value for p one can compute by the methods of this section 
the probability of any event A that can be expressed in terms of independent 
trials of the event £. 

The reader should also note that (3.2) is very similar to (1.13). By means 
of the considerations of section 2, it can be seen that (3.2) and (1.13) are 
equivalent formulations of the same law. 

The binomial law, and consequently the quantity b(k;n, p), occurs 
frequently in applications of probability theory. The quantities b(k; n, р), 
К == 0, 1,..., n, are tabulated for p = 0.01 (0.01) 0.50 and n = 2(1) 49 
(that is, for all values of p and n in the ranges р = 0.01, 0.02, 0.03, .. -> 
0.50 and n —2,3,4,...,49) in “Tables of the Binomial Probability 
Distribution,” National Bureau of Standards, Applied Mathematics Series 
6, Washington, 1950. For illustrative purposes, various values of P 
between 0.01 and 0.5 and for n'= 2,3,...,10 are given in Table II on 


SEC. 3 INDEPENDENT BERNOULLI TRIALS 103 


р. 442. It should be noted that values of b(k; n, p) for p > 0.5 can be 
obtained from Table П by means of the formula 


(3.4) b(k:n, p) = b(n — k:n, 1 — р). 


P» Example ЗВ. By a series of tests of a certain type of electrical relay, 
it has been determined that in approximately 5 % of the trials the relay will 
fail to operate under certain specified conditions. What is the probability 
that in ten trials made under these conditions the relay will fail to operate 
one or more times? 

Solution: To describe the results of the ten trials, we write a 10-tuple 
(2,25, . ‚ 2,9) whose kth component 2, = s or f, depending on whether 
the relay did or did not operate on the th trial. We next assume that the 
ten trials constitute ten independent repeated Bernoulli trials, with 
probability of success p = 0.95 at each trial. The probability of no failures 
in the ten trials is (10; 10, 0.95) = (0.95)!? = 5(0: 10, 0.05). Consequently, 
the probability of one or more failures in the ten trials is equal to 


1 — (0,95)!9 = 1 — (0; 10, 0.05) = 1 — 0.5987 = 0.4013. 4 


Ь Example ЗС. How to tell skill from luck. A rather famous personage 
in statistical circles is the tea-tasting lady whose claims have been discussed 
by such outstanding scholars as R. A. Fisher and J. Neyman; see J. 
Neyman, First Course in Probability and Statistics, Henry Holt, New York, 
1950, рр. 272-289. "A Lady declares that by tasting a cup of tea made 
with milk she can discriminate whether the milk or the tea infusion was 
first added to the cup." Specifically, the lady’s claim is “not that she could 
draw the distinction with invariable certainty, but that, though sometimes 
mistaken, she would be right more often than not." To test the lady's 
Claim, she will be subjected to an experiment. She will be required to 
taste and classify n pairs of cups of tea, each pair containing one cup of 
їеа made by each of the two methods under consideration. Let p be the 
Probability that the lady will correctly classify a pair of cups. Assuming 
that the л pairs of cups are classified under independent and identical 
Conditions, the probability that the lady will correctly classify k of the n 


pairs is (”) p*q'-*. Suppose that it is decided to grant the lady’s claims if 


she correctly classifies at least eight of ten pairs of cups. Let P(p) be the 

Probability of granting the lady's claims. given that her true probability of 
> И 1й\ 4-5 10}, 

Classifying a pair of cups is p. Then Р(р) = ( 8 Jr E ( 9 TA + p? 


Since P(p) is equal to the probability that the lady will correctly classify at 
least eight of ten pairs. In particular, the probability that the lady will 
Sstablish her claim, given that she is skillful (say, p = 0.85) is given by 


104 ` INDEPENDENCE AND DEPENDENCE сн. 3 


P(0.85) — 0.820, whereas the probability that the lady will establish her 
claim, given that she is merely lucky (that is, p = 0.50) is given by P(0.50) = 
0.055. 


p> Example 3D. The game of ‘‘odd man out". Let N distinguishable coins 
be tossed simultaneously and independently, where N — 3. Suppose that 
each coin has probability p of falling heads. What is the probability that 
either exactly one of the coins will fall heads or that exactly one of the coins 
will fall tails? 

Application: In a game, which we shall call “odd man out," N persons 
toss coins to determine one person who will buy refreshments for the 
group. If there is a person in the group whose outcome (be it heads or 
tails) is not the same as that of any other member of the group, then that 
person is called an odd man and must buy refreshment for each member of 
the group. The probability asked for in this example is the probability 
that in any play of the game there will be an odd man. The next example is 
concerned with how many plays of the game will be required to determine 
an odd man. 

Solution: To describe the results of the N tosses, we write an N-tuple 
(2, Za... , zy) whose kth component is s or f, depending on whether the 
kth coin tossed fell heads or tails. We are then considering N independent 
repeated Bernoulli trials, with probability p of success at each trial. The 


probability of exactly one success is (1) pq* ~', whereas the probability of 


exactly one failure is P — 1] P 7«.. Consequently, the probability that 


either exactly one of the coins will fall heads or enactly one of the coins 
will fall tails is equal to N(p* -!q + рд). If the coins are fair, so that 
P = 4, then the probability is N/2* —, Thus, if five persons play the game 


of “odd man out” with fair coins, the probability that in any play of the 
game there will be a loser is J5.. 


> Example ЗЕ. The duration of the game of **odd man out". Let N persons 
play the game of “odd man out" with fair coins. What is the probability 
for n = 1, 2,... that n plays will be required to conclude the game (that 
is, the nth play is the first play in which one of the players will have an 
outcome on his coin toss different from those of all the other players)? 
Solution: Let us rephrase the problem. (See theoretical exercise 3.3.) 
Suppose that л independent plays are made of the game of ‘‘odd man out." 
What is the probability that on the nth play, but not on any preceding play, 


there Will be an odd man? Let P be the probability that on any play there 
will be an odd man. In example 3D it was shown that P = N[2N-1if N 
persons are tossing fair coins. Let Q = 1 — p. To describe the results of. 


SEC. 3 INDEPENDENT BERNOULLI TRIALS 105 


n plays, we write an n-tuple (21, 22, .. . , 2,) whose kth component is s or f, 
depending on whether the kth play does or does not result in an odd man. 
Assuming that the plays are independent, the л plays thus constitute 
repeated independent Bernoulli trials with probability P = N/2*-! of 
success at each trial. Consequently, the event (f... . . , f. 5)) of failure 
at all trials but the mth has probability 0"-1Р. Thus, if five persons toss 
fair coins, the probability that four tosses will be required to produce an 
odd man is (11/16)8(5/16). <j 

Various approximations that exist for computing the binomial proba- 
bilities are discussed in section 2 of Chapter 6. We now briefly indicate 
the nature of one of these approximations, namely, that of the binomial 
probability law by the Poisson probability law. 

The Poisson Law. A random phenomenon whose sample description 
space S consists of all the integers from 0 onward, so that 5 = (0, 1, 2, . . .}, 
and on whose subsets a probability function P[] is defined in terms of a 
parameter А > 0 by 


=O, 1,2; 


E 


a 
(3.5) PIR = eG 


is said to obey the Poisson probability law with parameter A. Examples of 
random phenomena that obey the Poisson probability law are given in 
section 3 of Chapter 6. For the present, let us show that under certain 
circumstances the number of successes іп л independent repeated Bernoulli 
trials, with probability of success p at each trial, approximately obeys the 


Poisson probability law with parameter 2 = пр. 
More precisely, we show that for апу fixed k = 0, 1, 2,...,апіл > 0 


ЕЕ 


n— oo 


To prove (3.6), we need only rewrite its left-hand side: 


12 ann De @ ak +)) 
I f=) oe 


n п“ 

Since lim [1 — (4/n)]” = e-^, we obtain (3.6). 
1—» со 
Since (3.6) holds in the limit, we тау write that it is approximately true 
for large values of n that 
(np) 
n| , su _ 

(3.7) (ра — р) е"? ОП? 


We shall not consider here the remainder terms for the determination of the 


106 INDEPENDENCE AND DEPENDENCE CH. 3 


accuracy of the approximation formula (3.7). In practice, the approxima- 
tion represented by (3.7) is used if p — 0.1. A short table of the Poisson 
probabilities defined in (3.5) is given in Table III (see p. 444). 


p> Example ЗЕ. It is known that the probability that an item produced 
by a certain machine will be defective is 0.1. Let us find the probability 
that a sample of ten items, selected at random from the output of 


the machine, will contain no more than one defective item. The re- 


А : 0 
quired probability, based on the binomial law, is Е )(0.1)°(0.9)!9 + 
ki (0.1)1(0.9)® = 0.7361, whereas the Poisson approximation given by 
(3.7) yields the value e-! + e~! = 0.7358. 4 


p» Example 3G. Safety testing vaccine. Suppose that at a certain stage 
in the production process of a vaccine the vaccine contains, on the average, 
m live viruses per cubic centimeter and the constant m is known to us. 
Consequently, let it be assumed that in a large vat containing V cubic 
centimeters of vaccine there are n = mV viruses. Let a sample of vaccine 
be drawn from the vat; the sample’s volume is v cubic centimeters. Let 
us find for k = 0,1,..., the probability that the sample will contain k 
Viruses. Let us write an n-tuple (21, Zə, . . . , z,) to describe the location of 
the n viruses in the vat, the jth component z; being equal to s or f, depending 
on whether the jth virus is or is not located in our sample. The probability 


p that a virus in the vat will be in our sample may be taken as the ratio of 
the volume of the sample to the volume of the vat, p = v[V, 


, if it is assumed 
that the viruses are dispersed uniformly in the vat. Assuming further that 
the viruses are independently dispersed in the vat, it follows by the binomial 


law that the probability P[(Aj] that the sample will contain exactly К 
viruses is given by 


н ma- (yt eg 


c V, 


If it is assumed that the sample has a volume v less than 1 % 


of the volume 
V of the vat, then by the Poisson approximation to the 


binomial law 


E -pmo D 
(3.9) PUK} =e ER 


As an application of this result, let us consider a vat of vaccine that 
contains five viruses per 1000 cubic centimeters. Then m = 0.005. Let a 


sample of volume v = 600 cubic centimeters be taken. We are interested 
in determining the probability P[{0}] that the sample will contain no viruses. 
This problem is of great importance in the design of a scheme to safety-test 


SEC. 3 INDEPENDENT BERNOULLI TRIALS 107 


vaccine, for if the sample contains no viruses one might be led to pass as 
virus free the entire contents of the vat of vaccine from which the sample 
was drawn. By (3.9) we have 

(3.10) P[(0)] = e-"" = e~(0-0050(600) — 6-3 = 0.0498, 


Let us attempt to interpret this result. If we desire to produce virus-free 
vaccine, we must design a production process so that the density m of 
viruses in the vaccine is 0. As a check that the production process is 
operating properly, we sample the vaccine produced. Now, (3.10) implies 
that when judging a given vat of vaccine it is not sufficient to rely merely on 
the sample from that vat, if we are taking samples of volume 600 cubic 
centimeters, since 5% of the samples drawn from vats with virus densities 
m = 0.005 viruses per cubic centimeter will yield the conclusion that no 
viruses are present in the vat. One way of decreasing this probability of a 
wrong decision might be to take into account the results of recent safety 


tests on similar vats of vaccine. E 


Independent Trials with More Than 2 Possible Outcomes. In the fore- 
going we considered independent trials of a random experiment with just 
two possible outcomes. It is natural to consider next the independent 
trials of an experiment with several possible outcomes, say r possible 
Outcomes, in which r is an integer greater than 2. For the sample descrip- 
tion space of the outcomes ofa particular trial we write Z — TIR EE IS 
We assume that we know positive numbers Pis Pos iis Pr whose sum is 1, 
Such that at each trial p, represents the probability that s; will be the out- 
соте of that trial. In symbols, there exist numbers pj, ps, .. . , p, such 


that 
(3.11) 0<p<1, forke-L2..^5 Ptpot es +p, = 1 


Р] = рь fork = 9). 


P» Example 3H. Consider an experiment in which two fair dice are tossed. 
Consider three possible outcomes, Sı, S2 and ss, defined as follows: if the 
sum of the two dice is five or less, we say that 5; is the outcome; if the 
sum of the two dice is six, seven, or eight, we say Sa is the outcome; if the 
sum of the two dice is nine or more, we say зз is the outcome. Then p, = 


5- 4 


EN 
18, Pa = 35 Рз = is 

Let S be the sample description space of n independent repeated 
trials of the experiment described. There are 7" descriptions іп S. The 


Probability P[((z,, Z2 . . -. 2,))] of any single-member event is equal to 
pi^ руз pr, in which ky gs cuc denote, respectively, the number 
of occurrences of Sy, S2, + + + 5, AMON the components of the description 
at ane 582): 


108 INDEPENDENCE AND DEPENDENCE CH. 3 


Corresponding to the binomial law, we have the multinomial law: the 
probability that in n trials the outcome s, will occur Ку times, the outcome ss 


will occur ky times, ..., the outcome 5, will occur k, times, for any non- 
negative integers k; satisfying the condition ky + ks +... + k, =n, is 
given by 

n! 
(3.12) ELLE pippi 


Ка k PL pe 


To prove (3.12), one must note only that the number of descriptions in 
S, which contain А5175, kss,'s, .. . , k,s,'s, is equal to the number of ways a 
set of size л can be partitioned into r ordered subsets of sizes Жы ес, Кыз 


respectively, which is equal to Ds k ГА ) Each of these descriptions 
Sega. e ey, 


has probability рр... рт, Consequently, (3.12) is proved. The name, 
“multinomial law" derives from the role played by the expressions given 
in (3.12) in the multinomial theorem [see (1.18) of Chapter 2]. The reader 
should note the similarity between (3.12) and (3.14) of Chapter 2; these 


two equations are in the same relationship to each other as (3.2) and (3.4) 
of Chapter 2, 


THEORETICAL EXERCISES 


3.1. Suppose one makes n independent trials of an experiment whose probability 


of success at each trial is p.. Show that the conditional probability that 


any given trial will result in'a Success, given that there are k successes in 
the л trials, is equal to k/n. 


3.2, Suppose one makes т +n independent trials of an experiment whose 
probability of success at each trial is р. Letq =1 — p. 


(0 Show that for any k =0,1,..., the conditional probability that 
exactly m + k trials will result in success, given that the first тт trials 


А А п т 
result in success, is equal to (7) pigs 


(ii) Show that the conditional 


à probability that exactly т + К trials will 
result in success, given that at | 


east т trials result in success, is equal to 
= + Al [T 
(3.13) RG 
2 (m+n py 
pl m+r q 
3.3. Suppose one performed a sequence of independent Bernoulli trials (in 
which the probability of success at each trial is p) until the first success 
occurs. Show for any integer n = 1,2... . that the probability that т 


will be the number of trials required to achieve the first success is pd" » 
Note: Strictly Speaking, this problem should be rephrased as follows. 


SEC. 3 INDEPENDENT BERNOULLI TRIALS 109 


3.4. 


3.5. 


3.6. 


3.7. 


3.8. 


Consider n independent Bernoulli trials, with probability p for success 
on any trial. What is the probability that the nth trial will be the first 
trial on which a success occurs? To show that the problem originally 
stated is equivalent to the reformulated problem requires the consideration 
of the theory of a countably infinite number of independent repeated 
Bernoulli trials; this is beyond the scope of this book. 


The behavior of the binomial probabilities. Show that, as k goes from 0 
to n, the terms (k; n, p) increase monotonically, then decrease monotoni- 
cally, reaching their largest value (i) in the case that (л + Dp is not an 
integer, when & is equal to the integer m satisfying the inequalities 
(3.14) (п + 0р - 1 <тх (п + 0)р 
and (ii) in the case (л + 1)р is an integer, when k is equal to either 
(a + Dp = Lor (n + l)p. Hint: Use the fact that 

b(k;n, p) (л = к + 0р (п + Dp — k 

= =1 + =T 

GIS) ттуу, kg kq 
Consider a series of n independent repeated Bernoulli trials at which the 
probability of success at each trial is p. Show that in order to have two 
distinct integers, k, апа ką, between 0 and л, such that the probability of 
k, successes in the л trials will be equal to the probability of ks successes in 
the л trials, it is necessary and sufficient that (л + 1)p be an integer. 


Show that the probability [denoted by P(r + 1), say] of at least (r + 1) 
successes in (и + 1) independent repeated Bernoulli trials, with proba- 


bility p of success at each trial, is equal to 


Р 
346 + t1) [ a" — аут de, 


Hint: P(r + 1) may be regarded as a function of p for r and n fixed. By 
differentiation, verify that 


d Sch NEN 
"d T= "п = pi? 4 
The behavior of the Poisson probabilities. Show that the probabilities of 
the Poisson probability law, given by (3.5), increase monotonically, then 
decrease monotonically as k increases, and reach their maximum when k 
is the largest integer not exceeding 5. 
behavior of the multinomial probabilities. Show that the probabilities 
n dA multinomial probability law, given by (3.12), reach their maximum 
at ky, ko. «s К» satisfying the inequalities, for i = 1, 2,...,r, 
(3.17) np; —1 «k; < (п tr — Dp; 
int: Prove first that the maximum is attained at and only át values. 
Ai e satisfying pik; € pilki + 1) for each pair of indices i and j. 
Add ‘these inequalities for all j and also for all i = j. (This result is taken 


. Feller, An Introduction to Probability Theory and its Applications, 
cd New York, Wiley, 1957, p. 161, where it is ascribed to 


P. A. P. Moran.) 


110 


3.1. 


3.2. 


3.3. 


3.4. 


3.5. 


3.6. 


3.7. 


3.8. 


3.9. 


3.10. 


3.11. 


INDEPENDENCE AND DEPENDENCE CH. 


w 


EXERCISES 


Assuming that each child has probability 0.51 of being a boy, find the 
probability that a family of 4 children will have (i) exactly 1 boy, (ii) 
exactly 1 girl, (iii) at least one boy, (iv) at least 1 girl. 


Find the number of children a couple should have in order that the 
probability of their having at least 2 boys will be greater than 0.75. 


Assuming that each dart has probability 0.20 of hitting its target, find the 
probability that if one throws 5 darts at a target one will score (i) no hits, 
(ii) exactly 1 hit, (iii) at least 2 hits. 


Assuming that each dart has probability 0.20 of hitting its target, find 
the number of darts one should throw at a target in order that the proba- 
bility of at least 2 hits will be greater than 0.60. 


Consider a family with 4 children, and assume that each child has proba- 
bility 0.51 of being a boy. Find the conditional probability that all the 


children will be boys, given that (i) the eldest child is a boy. (ii) at least 
І of the children is a boy. d 


Assuming that each dart has probability 0.20 of hitting its target, find 
the conditional probability of obtaining 2 hits in 5 throws, given that one 
has scored an even number of hits in the 5 throws. 


А certain manufacturing process yields electrical fuses, of which, in the 
long run, 15% are defective. Find the probability that in a sample of 10 
fuses selected at random there will be (i) no defectives, (ii) at least | 
defective, (iii) no more than 1 defective. 


A machine normally makes items of which 5 % are defective. The practice 
of the producer is to check the machine every hour by drawing a sample of 
size 10, which he inspects. If the sample contains no defectives, he allows 
the machine to run for another hour. What is the probability that this 
practice will lead him to leave the machine alone when in fact it has shifted 
to producing items of which 10% are defective? 


(Continuation of 3.8). How large a sample should be inspected to insure 


that if p = 0.10 the probability that the machine will not be stopped is 
less than or equal to 0.01? 


Consider 3 friends who contract a disease; medical experience has shown 
that 10% of people contracting this disease do not recover. What is the 


probability that (i) none of the 3 friends will recover, (ii) all of them will 
recover? 


Let the probability that a person aged x years will survive | year be 


denoted by p,, whereas д, = 1 —p, is thé probability that he will die 
within a year. 


Consider a board of directors, consisting of a chairman 
and 5 members; all of the members are 60, the chairman is 65. Find the 


probability, in terms of ggg and qos, that within a year (i) no members will 


SEC. 3 INDEPENDENT BERNOULLI TRIALS 111 


3.12. 


3,13. 


3.15. 


3.16. 


3.17. 


3.18. 


3.19. 


3.20. 


die, (ii) not more than | member will die, (iii) neither a member nor the 
chairman will die, (iv) only the chairman will die. Evaluate these proba- 
bilities under the assumption that qso = 0.025 and д,; = 0.040. 


Consider a young man who is waiting for a young lady, who is late. To 
amuse himself while waiting, he decides to take a walk under the following 
set of rules. He tosses a coin (which we may assume is fair). If the coin 
falls heads, he walks 10 yards north; if the coin falls tails, he walks 10 
yards south. He repeats this process every 10 yards and thus executes 
what is called а "random walk." What is the probability that after 
walking 100 yards he will be (i) back at his starting point, (ii) within 10 
yards of his starting point, (iii) exactly 20 yards away from his starting 
point. 

Do the preceding exercise under the assumption that the coin tossed by 
the young man is unfair and has probability 0.51 of falling heads (proba- 
bility 0.49 of falling heads). 


. Let 4 persons play the game of "odd man but" with fair coins. What is the 


probability, for = 1, 2, . . . , that » plays will be required to conclude the 
game (that is, the nth play is the first play on which 1 of the players will 


have an outcome on his coin toss that is different from those of all the 
other players)? 
Consider an experiment that consists of tossing 2 fair dice independently. 


Consider a sequence of n repeated independent trials of the experiment. 
What is the probability that the nth throw will be the first time that the 


sum of the 2 dice is a 7? 


A man wants to open his door: he has 5 keys, only 1 of which fits the door. 
He tries the keys successively, choosing them (i) without replacement, 
(ii) with replacement, until he opens the door. For each integer k = 
1.2 , find the probability that the kth key tried will be the first to fit 


the door. 

A man makes 5 independent throws of a dart ata target. Let p denote 
his probability of hitting the target at each throw. Given that he has 
made exactly 3 hits in the 5 throws, what is the probability that the first 
throw hit the target? Express your answer In terms as simple as you can. 


m a loaded die; in 10 inde endent throws the probability that an 
Бас арреаг 5 times is twice the probability that an even 
number will appear 4 times. What is the probability that an even number 
will not appear at all in 10 independent throws of the die? 
ompany finds that 0.001 of the population incurs 

year. Assuming that the company has 
randomly from the population, what is 
an 3 of the company’s policyholders will 


An accident insurance ci 
a certain kind of accident each 
insured 10,000 persons selected 
the probability that not more th t 
incur this accident in a given year - 

irline finds that 4 per cent of the persons making reservations 
n airline will not show up for the flight. Consequently, their 
rved seats on a plane that has exactly 73 


A certai 1 
on a certain flight 
policy is to sell to 75 persons rese 


112 


3.21. 


3.22. 


3.23. 


3.24. 


3.25. 


3.26. 


INDEPENDENCE AND DEPENDENCE cH. 3 


seats. What is the probability that for every person who shows up for 
the flight there will be a seat available? 


Consider a flask containing 1000 cubic centimeters of vaccine drawn 
from a vat that contains on the average 5 live viruses in every 1000 cubic 
centimeters of vaccine. What is the probability that the flask contains (i) 
exactly 5 live viruses, (ii) 5 or more live viruses? 


The items produced by a certain machine may be classified in 4 grades, 
A, B, C, and D. It is known that these items are produced in the following 
proportions: 


Grade A Grade B Grade C Grade D 
0.3 0.4 0.2 0.1 


What is the probability that there will be exactly 1 item of each grade in a 
sample of 4 items, selected at random from the output of the machine? 


A certain door-to-door salesman sells 3 sizes of brushes, which he calls 
large, extra large, and giant. He estimates that among the persons he calls 
upon the probabilities are 0.4 that he will make no sale, 0.3 that he will 
sell a large brush, 0.1 that he will sell an extra large brush, and 0.2 that he 
will sell a giant brush. Find the probability that in 4 calls he will sell (i) no 
brushes, (ii) 4 large brushes, (iii) at least Ї brush of each kind. 


Consider a man who claims to be able to locate hidden sources of water 
by use of a divining rod. To test his claim, he is presented with 10 covered 
cans, l at a time; he must decide, by means of his divining rod, whether 
each can contains water. What is the probability that the diviner will 
make at least 7 correct decisions just by chance? Do you think that the 
test described in this exercise is fairer than the test described in exercise 
2.14 of Chapter 2? Will it make a difference if the diviner knows how 
many of the cans actually contain water? 


In their paper “Testing the claims of a graphologist," Journal of Person- 
ality, Vol. 16 (1947), pp. 192-197, С. R. Pascal and B. Suttell describe an 
experiment designed to evaluate the ability of a professional graphologist. 
The graphologist claimed that she could distinguish the handwriting of 
abnormal from that of normal persons. The experimenters selected 1 
persons who had been diagnosed as psychotics by at least 2 psychiatrists. 
For each of these persons a normal-control person was matched for age 
Sex, and education. Handwriting samples from each pair of persons 
Were placed in a separate folder and presented to the graphologist, wa 
was able to identify correctly the sample of the psychotic in 6 of the 19 
pairs, 

G) What is the probability that she would have been correct on at least 
6 pairs just by chance? 
(ii) How many correct 
so that the probabili 
is 5% or less? 


Two athletic teams 
is the winner. 


judgements would the graphologist need to make 
y of her getting at least that many correct by chance 


play a series of games; the first team winning 4 games 
The World Series is an example. Suppose that 1 of the 


SEC. 4 DEPENDENT TRIALS 113 
* 
teams is stronger than the other and has probability p of winning each 


game, independent of the outcomes of any other games. Assume that a 
game cannot end in a tie. Show that the probabilities that the series will 
end in 4, 5, 6, or 7 games are (i) if p — $, 0.21, 0.30, 0.27, and 0.22, 
respectively, and (ii) if p — 3, 0.125, 0.25, 0.3125, and 0.3125, respectively. 


chosen at random, are asked if they favor a certain 
bability that a majority of the persons polled will 
hat 45% of the population favor the proposal. 


3.27. Suppose that 9 people, 
proposal. Find the pro 
favor the proposal, given t 
3.28. Suppose that (i) 2, (ii) 3 restaurants compete for the same 10 patrons. 
Find the number of seats each restaurant should have in order to have a 


probability greater than 95% that it can serve all patrons who come to it 
(assuming that all patrons arrive at the same time and choose, indepen- 


dently of one another, each restaurant with equal probability). 


3.29. A fair die is to be thrown 9 times. What is the most probable number of 
throws on which the outcome is (i) a 6, (ii) an even number? 


4. DEPENDENT TRIALS 


In section 4 of Chapter 2 the notion of conditional probability was 
discussed for events defined on а sample description space on which a 
Probability function was defined. However, an important use of the notion 
of conditional probability is t0 set up @ probability function mE ME of 
а sample description space S, which consists of n trials that are m a er 
More correctly, nonindependent). In many applications o m d 
theory involving dependent trials one will state one's кыр x n icis 
the. random phenomenon under consideration in "eni o сеп : Rim 
ditional probabilities that suffice to specify the probability тосе о 

random phenomenon. 


As in section 2, fork = 1,2; «s !^ let .27, be the family of events on S 


Which depend on the kth trial. Consider a m. E Шү тау s ie 
as the i i ‚ А m Ahs.: Aw © еуеп I Ao +++, Ап» WA 
belong ae dre di E- "respectively. Now suppose that аре 
Ll assa 
function PH has been defined on the subsets of S m "n 
PIA] > 0. Then, by the multiplicative rule given in theoretic 4, 
| | pr: PL REM 
@л) Pp] = РИДА | АДР1Аз1 А» 4] РИ | Avda ea 
У 2 any event A that is a combinatorial product 
ni section of n events, each depending on only 
ш d out there, а probability function defined 
onsisting of n trials, is completely determined 


] product events. 
os of P[A] for any 


Now, as shown in section 
€vent may be written as the in 
One trial. Further, as ме pO! 
Оп the subsets of a space S, С 
by its values on combinatoria 

Consequently, to know the у 


event A it suffices to 


114 INDEPENDENCE AND DEPENDENCE єн. 3 


know, for k = 2,3, . . . , n, the conditional probability Р[А„| Ay, ... ,A;—a] 
of any event A, depending on the kth trial, given any events 4, CTI 
Ара depending on the Ist, 2nd,...,(k — 1)5ї trials, respectively; one 
also must know P[A,] for any event A, depending on the first trial. In other 
words, if one assumes a knowledge of 


P[A] 
Р[А | Ау] 
Р[Аз | Ay, Aa] 
(4.2) $ 
PIA, | Ay, Ат, Anal 
for any events A, іп 27у, Ap in 5Z,,..., 4, in Zp one has thereby 


specified the value of P[A] for any event А on S. 


Ь Example 4А. Consider an urn containing M balls of which My are 
white. Let a sample of size п < M be drawn without replacement. Let 
us find the probability of the event that all the balls drawn will be white. 
The problem was solved in section 3 of Chapter 2; here, let us see how 
(4.2) may be used to provide insight into that solution. For i= 1,...,л 
let A; be the event that the ball drawn on the ith draw is white. We 
are then seeking P[4,45... A,] It is intuitively appealing that the 
conditional probability of drawing a white ball on the ith draw, given 


that white balls were drawn on the preceding (i — 1) draws, is described for 
i=2,...,nby 


(4.3) Ria) AM ILE р. РЕ a. 


since just before the ith draw there are M — (i — 1) balls in the urn, of 
which Мр — (i — 1) are white. Let us assume that (4.3) is valid; more 
generally, we assume a knowledge of all the probabilities in (4.2) by means 
of the assumption that, whatever the first (i — 1) choices, at the ith draw 
each of the remaining M — i -- 1 elements will have probability 
1/(M — i + 1) of being chosen. Then, from (4.1) it follows that 


i= My(Myy — 1): (My n4 1) 


PUR Эш MM) (M-n) * 


n 


which agrees with (3.1) of Chapter 2 for the case of k = n. 4 


SEC. 4 DEPENDENT TRIALS 115 


Further illustrations of the specification of a probability function on 
the subsets of a space of п dependent trials by means of conditional 
probability functions of the form given in (4.2) are supplied in examples 4B 
and 4C. 


P» Example 4B. Consider two urns; urn I contains five white and three 
black balls, urn II, three white and seven black balls. One of the urns is 
Selected at random, and a ball is drawn from it. Find the probability that 
the ball drawn will be white. 

Solution: The sample description space of the experiment described 
consists of 2-tuples (гу, 2), in which z, is the number of the urn chosen and 
2, is the “name” of the ball chosen. The probability function P[-] on the 
subsets of S is specified by means of the functions listed in (4.2), with 
n = 2, which the assumptions stated in the problem enable us to compute. 
In particular, let C, be the event that urn I is chosen, and let C, be the 
event that urn II is chosen. Then P[C,] = P[C;] = }. Next, let B be the 
event that a white ball is chosen. Then Р[В | Cj] = $, and P[B| Cy] = 55. 
The events C, and C, are the complements of each other. Consequently, 


by (4.7) of Chapter 2, 
(4.5)  P[B]— P[B| С]РІС] + PIB | СРС = #5. < 


» Example 4C. A case of hemophilia.* The first child born to a certain 
Woman was a boy who had hemophilia. The woman, who had a long 
family history devoid of hemophilia, was perturbed about having a second 
Child. She reassured herself by reasoning as follows. My son obviously 
did not inherit his hemophilia from me. Consequently, he is a mutant. 
The probability that my second child will have hemophilia, if he is a boy, 
is consequently the probability that he will be a mutant, which is a very 
Small number т (equal to, say, 1/100,000)." Actually, what is the condi- 
tional probability that a second son will have hemophilia, given that the 


first son had hemophilia? sal 
Solution: ite a 3-tuple (21, 2» 
паты га = а ih pee to hemophilia. Let z, equal s or f; 
depending on whether the mother is or is not a hemophilia carrier. Let 
?» equal s or f, depending on whether the first son 15 or 1s not pesce 
Let 23 equal s or f, depending on whether the second son will or will not 
have hemophilia. On this sample description space, we define the events 
^, Ay, and Аз: A, is the event that the mother isa hemophilia carrier, 
2 is the event that the first son has hemophilia, and А; is the event that 
Че second son will have hemophilia. To specify a probability function 


23) to describe the history of the 


* I am ind 


ebted to my esteemed colleague Lincoln E. Moses for the idea of this 
example. y 


116 INDEPENDENCE AND DEPENDENCE CH. 3 


on the subsets of S, we specify all conditional probabilities of the form 
given in (4.2): 


P[A,] = 2m, P[A,‘] = 1 — 2m, 
P[45 | 41] = $, P[As | А =}, 
Р[А, | Aj] = т, Р[А | 4] = 1 — m, 
(4.6) P[A45 | Ay, 45] = P[4; | A, Ay] = 1, 
P[As* | Ay, А] = P[As | A, Ay] = 3, 
Р[А | Ay’, Ag] = P[As | Ay’, А5] = m, 
P[As | Ay’, Ag] = Р[А; | Ay’, 2] =1—т. 


In making these assumptions (4.6) we have used the fact that the woman has 
no family history of hemophilia. A boy usually carries an Y chromosome 
and a Y chromosome; he has hemophilia if and only if, instead of an X 
chromsome, he has an X’ chromosome which bears a gene causing 
hemophilia. Let m be the probability of mutation of an X chromosome 
into an Y' chromosome. Now the mother carries two Y chromosomes. 
Event A, can occur only if at least one of these Y chromosomes is a 
mutant; this will happen with probability 1 — (1 — т)? = 2m, since m? 
is much smaller than 2m. Assuming that the woman is a hemophilia 
carrier and exactly one of her chromosomes is X^, it follows that her son 
Will have probability 1 of inheriting the X’ chromosome. 
We are seeking Р[А» | Ay]. Now 


Р[А,А3] 

(4.7) PUA, [Ag] = 
е 

To compute P[A,4.], we use the formula 

(4.8) E [443] =P [414543] ao [424,43] 


= P[A]P[A; | Aj]P[A, | Ap, Ay] 
+ PLAY IPAs | Aj]P[A | Ap, 455] 
= 2т(4)3 + (1 — 2т)тт 
= }т, 
since we may consider 1 — 2m as a 
approximately equal to 0. To comp 


(4.9) 


pproximately equal to 1 and т? as 
ute Р[4,], we use the formula 
Р[А = Р[А, | A3]P[Ai] + PIA, | Ay ]P[A;] 

= $m + т(1 — 2m) 

== 2т. 


SEC. 4 DEPENDENT TRIALS 117 


Consequently, 
іт 1 


(4.10) Piil 4] = 5= = 2: 


Thus the conditional probability that the second son of a woman with no 
family history of hemophilia will have hemophilia, given that her first son 
has hemophilia, is approximately 1! 4 


A very important use of the notion of conditional probability derives 
from the following extension of (4.5). Let Cj, C» ..., C, be n events, 
each of positive probability, which are mutually exclusive and are also 
exhaustive (that is, the union of all the events Су, Съ... ., С, is equal to 
the certain event). Then, for any event В one may express the unconditional 
probability P[B] of B in terms ofthe conditional probabilities P[B | Ci], . . . , 
P[B | C,] and the unconditional probabilities P[G], . .  , РІС]: 


(4.11) P[B] = PIB | CJPICI + * :* + PIB | CJPLC,] 
if 
QUO U ces О GES CGCG=0 fori#j, 
P[C] > 0. 


Equation (4.11) follows immediately from the relation 
(4.12) P[B] = P[BC,] + PIBC ++ P[BC,] 


and the fact that P[BC] = PIB | CJP[CÀ for any event C;. 

> Example 4D. On drawing a sample from a sample. Consider a box 
Containing five radio tubes selected at random from the output of a 
Machine, which is known to be 20% defective on the average (that is, the 
Probability that an item produced by the machine will be coma is 02). 
(i) Find the probability that a tube selected from the bax will be defective. 
(ii) Suppose that a tube selected at random from the box is defective; what 
15 the probability that a second tube selected at random from the box will 


€ defective? . — 
Solution: To describe the results of the experiment that consists in 
Selecting five tubes from the output of the machine and then selecting one 
tube from among the five previously selected, we write a 6-tuple p n Ke 
Žas 25, 24); fork = 1,2,...› 5 215 equal to 5 or f, depending on w. : er 
the kth tube selected is defective or nondefective, whereas 2, 1s equa d 
9r f, depending on whether the tube selected from ш Бе 
Selected is defective ог nondefective. =0,...,5 let C; deno 


For j : 
achine. 
vent that j defective tubes were selected from the output of the т 


118 INDEPENDENCE AND DEPENDENCE cH. 3 


5 i às 
Assuming that the selections were independent, P[C;] — (5) 2. 2. 


Let B denote the event that the sixth tube selected from the box, is defective. 
We assume that P[B | Cj] = j/5; in words, each of the tubes in the box is 
equally likely to be chosen. By (4.11), it follows that 


) = si(3 /(0.8» i 
(4.13) Рв] = Y :() e» J 


To evaluate the sum in (4.13), we write it as 


5 2 (?)o2y03 = 0.2) X (, * Jogos = 02, 
(4.14) PHI (0.8) 02 X (,* |) 20.8) 


in which we have used the easily verifiable fact that 


(415) Jj p - ү J 
nj y —1 

and the fact that the last sum in (4.14) is equal to 1 by the binomial 
theorem. Combining (4.13) and (4.14), we have Р|В] = 0.2. In words, we 
have proved that selecting an item randomly from a sample which has been 
selected randomly from a larger population is statistically equivalent to 
selecting the item from the larger population. Note the fact that P[B] = 0.2 
does not imply that the box containing five tubes will always contain one 
defective tube. 

Let us next consider part (ii) of example 4D. To describe the results of 
the experiment that consists in selecting five tubes from the output of the 
machine and then selecting two tubes from among the five previously 
selected, we write a 7-tuple (z,, z,, . . ., 2), in which 2, and z, denote the 
tubes drawn from the box containing the first five tubes selected. Let 
Со, +++, C; and B be defined as before. Let А be the event that the seventh 
tube is defective. We seek P[A | B]. Now, if two tubes, each of which 
has probability 0.2 of being defective, are drawn independently, the 
conditional probability that the second tube will be defective, given that 
the first tube is defective, is equal to the unconditional probability that the 
second tube will be defective, which is equal to 0.2. We now proceed to 
prove that P[A | В] = 0.2. In so doing, we are proving a special case of 
the principle that a sample of size 2, drawn without replacement from a 
sample of any size whose members are selected independently from a given 
population, has statistically the same properties as a sample of size 2 whose 
members are selected independently from the population! More general 
statements of this principle are given in the theoretical exercises of section 


SEC. 4 DEPENDENT TRIALS 119. 
4, Chapter 4. We prove that Р[А | B] = 0.2 under the assumption that 
Р[АВ | C] = (105) for = 0,... ‚5. Then, by (4.11), 


P[AB] = 5 Gs C) (0.2) (0.8) 


ѓо! Ja 


j 
Consequently, P[A | B] = P[AB]/P[B] = (0.2)°/(0.2) = 0.2. < 


= 0.2% $ (,2 ,) 0908 = (ооу. 
j=2 6: 


Bayes's Theorem. There is an interesting consequence to (4.11), which 
has led to much philosophical speculation and has been the source of 
much controversy. Let Cj. Qus Ey, DE A mutually exclusive and 
exhaustive events, and let B be an event for which one knows the conditional 
probabilities P[B | С] of B, given C; and also the absolute probabilities 
P[C]. One may then compute the conditional probability P[C; | B] of any 


one of the events C; given В, by the following formula: 
P[BC; P[B | C]P[CÀ 
Tag Pc Be m 
У PIB | CIPLC: 
j=l 


16) is called **Bayes's theorem" or *'Bayes's 


formula," after the English philosopher Thomas Bayes.* If the events C; 
are called “causes,” then Bayes's formula can be regarded as a formula 
for the probability that the event В, which has occurred, is the result of the 
"cause" C, In this way (4.16) has been interpreted as а formula for the 


probabilities of “causes” or "hypotheses." The difficulty with this inter- 
Pretation, however, is that in many contexts one will rarely know the 
al probabilities P[C,] of the 


Probabilities, especially the uncondition 
Causes," which enter into the right-hand side of (4.16). However, Bayes's 


theorem has its uses, as the following examples' indicate. 


The relation expressed by (4. 


s. Suppose. contrary to fact, there were 
ith the properties that P[A | C] = 0.95, 
ent that a person tested has 
tates that the person tested 


» Example 4E. Cancer diagnosi: 
a diagnostic test for cancer W 
P[A* | C^] = 0.95, in which С denotes the ev 
cancer and A denotes the event that the test s 


* A reprint of Bayes's original essay may be found in Biometrika, Vol. 46 (1958), 
Pp. 293-315, 

+ The use of Bayes's formula to evaluate probabilities during the course of play ofa 
bridge game is illustrated in Dan F. Waugh and Frederick V. Waugh, “Оп Probabilities 
'n Bridge,” Journal of the American Statistical Association, Vol, 48 (1953), рр. 79-87. 


CH. 3 
CE 

DEPENDEN 

ENCE AND 

INDEPEND| 

* 120 


t e e P € pr ability that a person W. 
has cancer. et us omput (GT А], th P ob bilit who 
din е u: as и. e ve 
t the test has cancer actually h t. We ha 
i to y 
according 


A | C]P[C] 
P[AC] . PIT 


C]PC] 
17 — PICIAl= “Fray ^ PLAT CIFIC] + PIA | 


usa me e pr that TS! k ng the test actua. y 
Let us h р obability hat a person taki 

ssu that t t t 
has cancer is given b P[C] = 0.005. Then 

a! y 


(0.95)(0.005) = 
— PC \ A\= LLLI 


0.00475 ' 
-x — = 0.087. 
M + 0.04075 

One should 


Carefully consid 
the cancer diagnostic test 1 
95% of t 
о, 


© Cases in Which 
8.7% of the cases 


in which t 


d. 
han 
c one in 
n the cer 
is result. O t can ly 
ing of this Fes ill detect С. олі 
SE e since it will a hand, 19 сег 
5 hig м resent. On the t asserts © cis 
ncer is ^ & n 
he test ei à positive ж “(This examp 4 
0 be present is it actually true that cancer is pu that 
continued in exercise 4 g ) ider an gt fait 
. А gs Consi ssar! tly 
Prior and posterior Probability. isare ni drm 
ns à large number o Coins: Not all of the co ssed 1n ү, 
et a coi mly from the urn and toss 


imes: pa^ 
55 tim?" xg 
r 
hat in the 100 tosses heads appes the 


; © Chosen тап, o 
times, Uppose t in (ар n 
ilin, Probability that xr Qin selected is a fair coin 
bility that the Coin wil] 


je 
D? 012—0, 
i to 2 0 
Р heads at each toss is sper write ^ ndin the 
: Тйл: To escri © results of the b pep Mr xm a we 
ether P^). The Mponents 2», -> 2101 are tails. Vr Ld ^ 
i “ther the Outcome 5 Téspective toss is heads or npone? eo ds 
Possible Values { bis Et d by the first con we a j 
assum that there jg a sét o numbers, ру, py, . ‚Ру м ӨП отб, 
апа Such that ПУ coin in the urn has as its раан 4.80 D ү 
Some one Of the ‘Umbers p Having p fall he P. 
ite ima d Probably hat ae eal cj aret 
e ly ^is One Of the numbers py. Now, fo $ ай eg 
ng tthe even та ree ility ps Or ek" 
Uthat ; robability sin ^ се im 
and let Be © event th € coin selected hi Lu 55 head ow S 9 
Let Jo be the numb the coin selecte y 
"IG, | Bj the Sumber, 1 
^ 0 


16) 
2 d 15 G 
Я : cte e 
| Probability that the coin т to W 

5 іп 100 tosses. In or 


c 
n^ eif 00 
1. We are ^. (gi 6 
9 М, such that p; = 4. 
» © conditio Pi 
given that i yiel 


SEC. 4 DEPENDENT TRIALS 121 


evaluate P[C; | B], we require a knowledge of P[C;] and P[B | Cj] for 
73 РЕ i By the binomial law, 
(4.19) P[|B| C] = [52] (p)**(1 — p)". 

The probabilities P[C,] cannot be computed but must be assumed. 
The probability P[C;] represents the proportion of coins in the urn which 
has probability p; of falling heads. It is clear that the value we obtain for 
Р[В | C; ] depends directly on the values we assume for P[Cj], . . . , P[Cy]. 
If the latter probabilities are unknown to us, then we must resign ourselves 
to not being able to compute P[B | Cj]. However, let us obtain a numerical 
answer for P[B | С] under the assumption that P[C] =... = P[C,] = 
1/N, so that a coin selected from the urn is equally likely to have any one 
of the probabilities p, - - -, ру. We then obtain that 


0 
(1/N) E ae Es 
(4.20) PIG, | B] = x 7100 : 
JN) Y (55 Jena cg 
2 (55 
Let us next assume that № = 9, and p; = j/10 for j = 1, 2,...,9. Then 
Jo = 5, and 
ipd (1/2)109 
55 
(4.21) P[C; | B] = +799 55 10]t5 
; 10 —j 
(Sono 10 — 7)/10} 
008475 одус 
0.097664 


The probability P[C;] = } is called the prior (or a priori) probability of 
the event C3; the conditional probability P[C; | B] — 0.496 is called the 
Posterior (or a posteriori) probability of the event C;. The prior probability 
I$ an unconditional probability that is known to us before any observations 
are taken. The posterior probability is а conditional probability that is of 
interest to us only if it is known that the conditioning event has occurred. 


Our next example illustrates a controversial use of Bayes's theorem. 


» Example 4G. Laplace's rule of succession. Consider a coin that in n 
independent tosses yields k heads. What is the probability that n’ sub- 
Sequent independent tosses will yield k' heads? The problem may also be 
Phrased in terms of drawing balls from an urn. Consider an urn that 
Contains white and red balls in unknown proportions. In a sample of size 
^, drawn with replacement from the urn, k white balls appear. What is the 


122 INDEPENDENCE AND DEPENDENCE CH. 3 


probability that a sample of size n’ drawn with replacement will contain к 
white balls? A particular case of this problem, in which k = n and k’ = 
n', can be interpreted as a simple form of the fundamental problem of 
inductive inference if one formulates the problem as follows: if n indepen- 
dent trials of an experiment have resulted in success, what is the probability 
that n’ additional independent trials will result in success? Another 
reformulation is this: ifthe results of nindependent experiments, performed 
to test a theory, agree with the theory, what is the probability that n’ 
additional independent experiments will agree with the theory. 

Solution: To describe the results of our Observations, we write an 
(n +n! + 1)-tuple (2,25, ..., Za+n +1) in which the components 2,,..., 
2,41 describe the outcomes of the coin tosses which have been made and 
the components z,,.,..., n+x 4.1 describe the outcomes of the subsequent 
coin tosses. The first component % describes the probability that the coin 
tossed has of falling heads; we assume that there are N known numbers, 
Ру, Py ^ Py, which z can take as its value. We have italicized this 
assumption to indicate that it is considered controversial. Forj = 1, 2,..., 
N let C; be the event that the coin tossed has probability p; of falling heads. 
Let B be the event that the coin yields n heads in its first л tosses, and let A 


be the event that it yields n' heads in its subsequent n’ tosses. We are 
seeking P[A | B]. Now 


(4.22) P[AB] — $ P[AB | C]P[C]] 
£a 
N 
= X + “PIG; 
whereas T 
(4.23) 


N 
PIB] = È (p)"PIC]. 


Let us now assume that P; is equal to j|N and that P[C;] = 1/N. Then 


N 
OIN) È (nye 


CIN) È слм)" 
j=1 
The sums in (4.24) may be approximately evaluated in the case that N is 
large by means of th 


€ integral calculus. The sums can be regarded as 
approximating sums of Riemann integrals, and we have 


1X j\ ntn 1 1 
=== ba () =| niw q: = 
М; AN, А m п+п +l’ 


i & T [ 1 
xA pe uini 2% 


(4.24) P[4| B] = 


(4.25) 


SEC. 4 DEPENDENT TRIALS 123 


Consequently, given that the first п tosses yielded a head, the conditional 
probability that т’ subsequent tosses of the coin will yield a head, under 
the assumption that the probability of the coin falling heads is equally likely 


to be any one of the numbers 11, 2/N, . .. , N/N, and N is large, is given by 
nal 
(4.26) PAL B. 


Equation (4.26) is known as Laplace's general rule of succession. If we 
take n’ = 1, then 

nal 
(4.27) Р[А | В] = 


n+2° 


Equation (4.27) is known as Laplace’s special rule of succession. 

Equation (4.27) has been interpreted by some writers on probability 
theory to imply that if a theory has been verified inn consecutive trials 
then the probability of its being verified on the (л + 1)st trial is (n + DJ 
(n + 2). That the rule has a certain appeal at first acquaintance may be 
Seen from the following example: 

Consider a tourist in a foreign city who scarcely understands the lan- 
Buage. With trepidation, he selects a restaurant in which to eat. After ten 
meals taken there he has felt no ill effects. Consequently, he goes quite 
confidently to the restaurant the eleventh time in thes имей that, 
according to the rule of succession, the probability is 1$ that he will not 
be poisoned by his next meal. "e 

However, it is easy to exhibit applications of the rule that lead to 
absurd answers. A boy is 10 years old today. The rule says that, having 
lived ten years, he has probability 11 of living one more yeer: On the 
Other hand, his 80-year-old grandfather has probability 81/82 of living one 
more year! Yet, in fact, the boy has a greáter probability of living one 


More year. — i aiai 
Laplace gave the following often-quoted application of the special rule 


Of succession. “Assume,” he says, "that history goes back 5000 years, 
that is, 1,826,213 days. The sun rose each day and so you can bet 1,826,214 


against | that the sun will rise again tomorrow." However, before believing 
vould believe the following consequence 


this asserti ourself if you v 
d s the sun having risen on each of the last 


of cession; 
To pene манет that it will rise on each of the next 1,826,214 
days is 1, which means that the probability 1s 4 that on at least one of the 
Next 1,826,214 days the sun will not rise. 4 

It is to be emphasized that Baye's formula and lapse rule of suc- 
Cession are true theorems, of mathematical probability theory. The ES 
Boing examples do not in any way cast doubt on the validity of these 


124 


INDEPENDENCE AND DEPENDENCE CH. 3 


theorems. Rather they serve to illustrate what may be called the fundamental 
principle of applied probability theory: before applying a theorem, one 
must carefully ponder whether the hypotheses of the theorem may be as- 
sumed to be satisfied. 


4.1. 


4.2. 


43. 


4.4. 


THEORETICAL EXERCISES 


Ап urn contains M balls, of which Мү; are white (where My X M). 
Let a sample of size m (where m < M) be drawn from the urn with replace- 
ment [without replacement] and deposited in the urn. Let a sample of size 
n (where n € m) be drawn from the second urn without replacement. Show 
that fork =0,1,..., nthe probability that the second sample will con- 
tain exactly k white balls continues to be given by (3.2) [(3.1)] of Chapter 
2. The result shows that, as one might expect, drawing a sample of size 
n from a sample of larger size is Statistically equivalent to drawing a 
sample of size n from the urn. An alternate statement of this theorem, 
and an outline of the proof, is given in theoretical exercise 4.1 of Chapter 4. 


Consider a box containing N radio tubes selected at random from the 
Output of a machine; the probability p that an item produced by the 
machine is defective is known. 

(i) Let k <n < N be integers. Show that the probability that п tubes 
Selected at random from the box will have К defectives is given by 


"| pe ope oe 
l) pg. 
(ii) Suppose that т tubes are se 
to be defective. Show that the 


from the remaining N 


equal to () p'q-*. 


(iii) Suppose that m + n tubes are selected at random from the box and 
tested. You are informed that at least m of the tubes are defective; show 
that the probability that exactly m + k tubes аге defective, where k is ап 


integer from 0 to n, is given by (3.13). Express in words the conclusions 
implied by this exercise. 


lected at random from the box and found 
probability that л tubes selected at random 
— m tubes in the box will contain К defectives is 


Consider an urn containing M balls, of which Мү; are white. Let N be 
an integer such that У > Муу. Choose an integer п at random from the 
берб... N}, and then choose a sample of size n without replacement 
from the urn. Show that the probability that all the balls in the sample 
will be white (letting Mp = M — Мур) is equal to 


12% (Мр). 1 My 
Мұ (М), NMgr +1’ 


An application of Bayes’s theorem. Suppose that in answering a question 


SEC. 4 DEPENDENT TRIALS 125 


4.5. 


4.1. 


4.2. 


4.3, 


4.4, 


on a multiple choice test an examinee either knows the answer or he 
guesses. Let p be the probability that he will know the answer, and let 
1 — p be the probability that he will guess. Assume that the probability 
of answering a question correctly is unity for an examinee who knows 
the answer and 1/m for an examinee who guesses; т is the number of 
multiple choice alternatives. Show that the conditional probability that 
an examinee knew the answer to a question, given that he has correctly 


answered it, is equal to 


mp 
1+ (n —Up^ 


Solution of a difference equation. The difference equation 
Pn = араа +0, n-235 


in which a and 6 are given constants, arises in the theory of Markov 
dependent trials (see section 5). By mathematical induction, show that 
if a sequence of numbers р, Pa» ++ +» Pn satisfies this difference equation, 
and if a # 1, then 

b 


1—-a' 


p» 7 (n7 1 


e? + 


EXERCISES 


UrnI ins 5 white and 7 black balls. Urn II contains 4 white and 2 
black balls. Find the probability of drawing a white ball if (i) 1 urn is 
selected at random, and a ball is drawn from it, (ii) the 2 urns are emptied 
into a third urn from which 1 ball is drawn. 


U ins 5 white and 7 black balls. Urn II contains 4 white and 2 
Nera P. fin ы is selected at random, and a ball is drawn from it. 
Given that the ball drawn is white, what is the probability that urn I 
was chosen? 

ini i 2red balls. If the 
A Il from an urn containing 4 white and 2 rec 
Aa erie, Apes not return it to the urn; if the ball is red, he does 
Teturn it He draws another ball. Let A be the event that the first ball 
drawn is white and let B be the event that the second ball drawn is white. 
Answer each of the following statements, true or false. (i) P[A] = 3, 
(ii) PIB] = 3, Gii) PIB | А] = $ GV) Р[А | B] = үч, (v) The events А and 
B are mutually exclusive. (vi) The events А and В аге independent. 
ining 6 white and 4 black balls, 5 balls are transferred 
те стр аха, From it 3 balls are transferred into an empty 
box. One ban is drawn from the box; it turns out to be white. What is 
the probability that exactly 4 of the balls transferred from the first to the 
second urn will be white? 


126 
4.5. 


4.6. 


4.7. 


4.8. 


4.10. 


4.11. 


4.12. 


INDEPENDENCE AND DEPENDENCE CH. 3 


Consider an urn containing 12 balls, of which 8 are white. Let a sample 
of size 4 be drawn with replacement (without replacement). Next, let a 
ball be selected randomly from the sample of size 4. Find the probability 
that it will be white. 


Urn I contains 6 white and 4 black balls. Urn II contains 2 white and 2 
black balls. From urn I 2 balls are transferred to urn II. A sample of 
size 2 is then drawn without replacement from urn IL. What is the 
probability that the sample will contain exactly 1 white ball? 


Consider a box containing 5 radio tubes selected at random from the 
output of a machine, which is known to be 20% defective on the average 
(that is, the probability that an item produced by the machine will be 
defective is 0.2). Suppose that 2 tubes are selected at random from the 
box and tested. You are informed that at least 1 of the tubes selected is 
defective; what is the probability that both tubes will be defective? 


Let the events А and C be defined as in example 4E. LetP[4| C] = 
Р[А° | C°] = Rand P[C] = 0.005. What value must R have in order that 
P[C| A] = 0.95? Interpret your answer. 


In a certain college the geographical distribution of men students is as 
follows: 50% come from the East, 30% come from the Midwest, and 
2075 come from the Far West. The following. proportions of the men 
Students wear ties: 80% of the Easterners, 60% of the Midwesterners, 
and 40% of the Far Westerners. What is the probability that a student 


who wears a tie comes from the East? From the Midwest? From the 
Far West? 


Consider ап urn containing 10 balls, of which 4 are white. Choose an 
Integer л at random from the set (1, 2, 3, 4, 5, бу and then choose a sample 


of size n without replacement from the urn. Find the probability that all 
the balls in the sample will be white 


Each of 3 boxes, identical in appearance, has 2 drawers. Box A contains а 
gold coin in each drawer; box B contains a silver coin in each drawer; 
box C contains a gold coin in 1 drawer and a silver coin in the other. 
A box is chosen, one of its drawers is opened, and a gold coin is found. 
(i) What is the probability that the other drawer contains a silver coin? 
Write out the probability space of the experiment. Why is it fallacious to 
reason that the probability is 3 that there will be a silver coin in the second 


drawer, since there are 2 possible types of coins, gold or silver, that may 
be found there? 


(ii) What is the probability that the box chosen was box 4? Box B? 
Box C? 

Three prisoners, 
jailer that one of 
other 2 are to be 
then reasons to h 
then asks the jail 


whom we may call А, B, and C, are informed by their 
them has been chosen at random to be executed, and the 
freed. Prisoner А, who has studied probability theory, 
imself that he has probability } of being executed. He 
er to tell him privately which of his fellow prisoners will 


SEGOH DEPENDENT TRIALS 


4.13. 


4.14, 


4.15, 


4.16. 


4.17, 


127 


be set free, claiming that there would be no harm in divulging this infor- 
mation, since he already knows that at least 1 will go. The jailer (being 
an ethical fellow) refuses to reply to this question, pointing out that if 4 


knew which of his fellows were to be set free then his probability of being 
executed would increase to 3, since he would then be | of 2 prisoners, 
1 of whom is to be executed. Show that the probability that A will be 
executed is still 1, even if the jailer were to answer his question, assuming 
that, in the event that A is to be executed, the jailer is as likely to say that 


Bis to be set free as he is to say that C is to be set free. 


A male rat is either doubly dominant (АА) or heterozygous (Aa), owing to 
Mendelian properties, the probabilities of either being true is 3. The 
male rat is bred to a doubly recessive (aa) female. If the male rat is 
doubly dominant, the offspring will exhibit the dominant characteristic: 


if heterozygous, the offspring will exhibit the dominant characteristic 3 
1 of the time. Suppose all of 


of the time and the recessive characteristic 3 

3 offspring exhibit the dominant characteristic. What is the probability 
that the male is doubly dominant? 

and 7 black balls. A ball is drawn 
in addition, 3 balls of the color 
drawn from the urn. Find the 
ill be black, (ii) both balls 


Consider an urn that contains 5 white 

and its color is noted. It is then replaced; 

drawn are added to the urn. A ball is then 

probability that (i) the second ball drawn w 

drawn will be black. 

Consider a sample of size 3 drawn in the following manner. One starts 
Ils. At each trial a ball is 


with an urn containing 5 white and 7 red bal 
drawn and its color is noted. The ball drawn 15 then returned to the urn, 


together with an additional ball of the same color. Find the probability 
that the sample will contain exactly (i) 0 white balls, (ii) 1 white ball, 
(iii) 3 white balls. 
A certain kind of nuclear particle splits into 0, 1, or 2 new particles (which 
We call offsprings) with probabilities 4,4, and $ respectively, and UU 
The individual particles act independently of each other. Given a n ic А 
let X, denote the number of its offsprings, let X, denote the number of 
offsprings of its offsprings, and let Хз denote the number of offsprings О 

the offsprings of its offsprings- 


(i) Find the probability that Xs 
(ii) Find the conditional probability th 
(iii) Find the probability that Xs = 0. 


>0. 
at X, = 1, given that X, = 1, 


A number, denoted by Xj, is chosen at random from the set of integers 
(1,2,3, 4). А о ий diee n denoted by AX», is chosen at random from 


the set (1, 2,..., Ху}. 
peni find the conditional probability that 


(i) For each integer К, 1 to 4, 
X, = 1, given that № = k. 


(ii) Find the probability that Xs 
(iii) Find the conditional proba 


= 1. 


bility that X, —2, given that № = 1. 


128 INDEPENDENCE AND DEPENDENCE сн. 3 
5. MARKOV DEPENDENT BERNOULLI TRIALS 


Of interest in many problems of applied probability theory is the evolu- 
tion in time of the state of a random phenomenon. For example, suppose 
one has two urns (I and II), each of which contains one white and one 
black ball. One ball is drawn simultaneously from each urn and placed 
in the other urn. One is often concerned with questions such as, what is 
the probability that after 100 repetitions of this procedure urn I will 
contain two white balls? The theory of Markov* chains is applicable to 
questions of this type. 

The theory of Markov chains relates to every field of physical and social 
science (see the forthcoming book by A. T. Bharucha-Reid, Introduction 
to the Theory of Markov Processes and their Applications; for applications 
of the theory of Markov chains to the description of social or psychological 
phenomena, see the book by Kemeny and Snell cited in the next paragraph). 
There is an immense literature concerning the theory of Markov chains. 
In this section and the next we can provide only a brief introduction. 

Excellent elementary accounts of this theory are to be found in the 
works of W. Feller, An Introduction to Probability Theory and Its Applica- 
tions, second edition, Wiley, New York, 1957, and J. G. Kemeny and 
J. L. Snell, Finite Markov Chains, Van Nostrand, Princeton, New Jersey, 
1959. The reader is referred to these books for proof of the assertions 
made in section 6. 

The natural generalization of the notion of independent Bernoulli trials 
is the notion of Markov dependent Bernoulli trials. Given n trials of an 
experiment, which has only two possible outcomes (denoted by s or f, for 
"success" or “failure”), we recall that they are said to be independent 
Bernoulli trials if for any integer k (1 to n —1) and k + 1 events А,, 


Же aa PES depending, respectively, on the first, зесопа,..., (k + 1)st 
trials, 
(5.1) РГА, | Ar Aas А] = Р[А,.1]. . 


We define the trials as Markov dependent Bernoulli trials if, instead of (5.1), 
it holds that 


(5.2) ШЕ Ap Аъ, Ау] = РА | Aj]. 


In words, (5.2) says that at the kth trial the conditional probability of any 
event Arı, depending on the next trial, will not depend on what has 
happened in past trials but only on what is happening at the present time. 
One sometimes says that the trials have no memory. 


* The theory of Markov chains derives its name from the celebrated Russian 
probabilist, A. A. Markov (1856-1922). 


SEC. 4 MARKOV DEPENDENT BERNOULLI TRIALS 129 


Suppose that the quantities 


P(s, 5) = probability of success on the (k + 1)st trial, 
given that there was success on the kth trial, 


P(f, s) = probability of success at the (k + 1)st trial, 
(5.3) given that there was failure at the Ath trial, 


P(f. f) = probability of failure at the (k + 1)st trial, 
given that there was failure at the Ath trial, 


P(s, f) = probability of failure at the (k + 1)st trial, 
given that there was success at the Ath trial, 


are independent of k. We then say that the trials are Markov dependent 
repeated Bernoulli trials. 


P» Example 5A. Let the weather be observed on л consecutive days. 
Let s describe a day on which rain falls, and let f describe a day on which 
no rain falls. Suppose one assumes that weather observations constitute 
а series of Markov dependent Bernoulli trials (or, in the terminology of 
section 6, a Markov chain with two states). Then P(s, s) is the probability 
of rain tomorrow, given rain today; P(s, f) is the probability of no rain 


tomorrow, given rain today; P(f,f) is the probability of no rain tomorrow, 
given no rain today; and P(f, 5) is the probability of rain tomorrow, given 
nO rain today. It is now natural to ask for such probabilities as that of 
rain the day after tomorrow, given no rain today; we denote this proba- 


bility by P,(/, s) and obtain a formula for it. 
peated Bernoulli trials the probability - 


function P[] on the sample description space of the л trials is completely 
specified once we have the probability p of success at each trial. In the 
case of Markov dependent repeated Bernoulli trials it suffices to specify 


the quantities 


In the case of independent re 


pis) = probability of success at the first trial, 


5; А 
84 pi f) = probability of failure at the first trial, 


(5.3). The probability of any 


as well as the conditional probabilities in 
For example, for 


event can be computed in terms of these quantities. 
k=1,2,...,nlet ; 
pis) = probability of success at the kth trial, 


9; А 4 
n: PAS) = probability of failure at the kth trial. а 


130 INDEPENDENCE AND DEPENDENCE сн. 3 


The quantities p,(s) satisfy the following equations for k = 2, 3,... si 


(5.6) pi) = рь 1(5)Р(5, 5) + pral JP, s) 
= Prals)P(s, 5) + [1 — рь (I1 — РОУ, Р) 
= р,-1(5)1Р(5, s) + P(f; f) — 1] + [1 — РО Ј)]. 


То justify (5.6), we reason as follows: if A, is the event that there is 
success on the kth trial, then 


(5.7) PIA] = PIA. 1]Р[А, | Ayal + PLA*, РІА, | 4%, 1]. 


From (5.7) and the fact that 


G8 Psf)21—P(Gs  P(fs)-1— PLD» р) = 1 = pls) 


one obtains (5.6). 
Equation (5.6) constitutes a recursive relationship for Pi 


(s), known as a 
difference equation. Throughout this section we make the 


assumption that 
(5.9) IPs, 5) + P(f, f) — 1 « 1. 


By using theoretical exercise 4.5, it follows from (5.6) and (5.9) that for 
91,2, svn 


1—P(f, 
(5.10) p,(s) = Eo — =з s) + P(f, f) — 1-1 


1—Р(//) 
т E PG 5) P nl 


By interchanging the role of s and f in (5.10), 


we obtain, similarly, for 
REL Daryn 
1 —P(s; 
(5.11) pf) = Г = Tr gpl s) + PSS) — 1-1 


1 = P(s, 9 
T [5 — P(s, s) iul 


It is readily verifiable that the ex 


pressions in (5.10) and (5.11) sum to one, 
as they ought. 


In many problems involving Markov dependent repeated Bernoulli 


БЕС; S MARKOV DEPENDENT BERNOULLI TRIALS 131 


trials we do not know the probability p,(s) of success at the first trial. We 
can only compute the quantities 
P,(s, 5) = conditional probability of success at the 
(k + I)st trial, given success at the first trial, 


P,(s,f) = conditional probability of failure at the 
(k + I)st trial, given success at the first trial, 


5.12 
$48 P,(f,f) = conditional probability of failure at the 
(k + 1)st trial, given failure at the first trial, 
P,(f, s) = conditional probability of success at the 
(k + 1)5ї trial, given failure at the first trial. 
Since 
(5.13) P(s,f)-1—P(ss, PAS) = 1 — PASS), 


it suffices to obtain formulas for P,(s, s) and Pf. f). 
In the same way that we obtained (5.6) we obtain 
(5.14) Рз, s) = Pals, s)PG, 8) + Pra PPO, 5) 
= Pas, s)P(s, 5) + [1 — Prats, I. — РОЛ) 
= Pas, s)[P(s, 5) + PC f) — 1] - [1 — POS. 
By using theoretical exericse 4.5, it follows from (5.14) and (5.9) that for 
dem E ай 


1 = PGS) 
($15 P,(s, 8) = |» 59 — 73 9 — real 


x [P(s, s) + PUGS) — 18 + [Áo] 
Which can be simplified to 
(5.16) Ps, 5) = уу 9 + PUGS) — 1 
+ " І 850 | 
2 — P(s,s) — P(f.f) 
By interchanging the role of s and f, we obtain, similarly, 


Е 1— R(f.f) Y k 
Gi) — Af) оу Рр РИ) 0 


1 — P(s, 5) 
tI PES- PGS) 


132 INDEPENDENCE AND DEPENDENCE CH. 3 


By using (5.13), we obtain 


1 — P(s, s) " 
(5.18) P,(s,f) = 77—99 Ру б 5) + РОЛ) — 18 
А 4. 1 — P(s, s) 


2—P(s,s) — PUSY 
1 — F(f, А 
639) Ра) = — = — FPL 5,5) + АД) — Y 
L= POET) 
2— Pls) — FJ 


Equations (5.16) to (5.19) represent the basic conclusions in the theory of 
Markov dependent Bernoulli trials (in the case that (5.9) holds). 


+ 


> Example 5B. Consider a communications system which transmits the 
digits 0 and 1. Each digit transmitted must pass through several stages, 
at each of which there is a probability p that the digit that enters will be 
unchanged when it leaves. Suppose that the system consists of three 
stages. What is the probability that a digit entering the system as 0 will be 
(i) transmitted by the third stage as 0, (ii) transmitted by each stage as 0 
(that is, never changed from 0)? Evaluate these probabilities for p = }. 

Solution: In observing the passage of the digit through the communica- 
tions system, we are observing a 4-tuple (21, 25, 23, 23), whose first com- 
ponent z, is 1 or 0, depending on whether the digit entering the system is 
lor 0. For i = 2, 3, 4 the component 2; is equal to 1 or 0, depending on 
whether the digit leaving the ith stage is 1 or 0. We now use the foregoing 


formulas, identifying s with 1, say, and 0 with f. Our basic assumption is 
that 


(5.20) P(0,0) = P(1, 1) = p. 
The probability that a digit entering the system as 0 will be transmitted 
by the third stage as 0 is given by 


1 — P(0,0 
620 лоо) y ATTO O + A, 1) — 1 
1 — Р(1, 1) 


T 2 — Р(0, 0) — P(1, 1) 


= йо, с.а Мар 
faa? Ye 


= 1 + Qp — 1%, 


SEC. 5 MARKOV DEPENDENT BERNOULLI TRIALS 133 
If p = }, then P,(0, 0) = 3[1 — (3)] = 3}. The probability that a digit 
entering the system as 0 will be transmitted by each stage as 0 is given by 
the product 

P(0, 0)P(0, 0)P(0, 0) = р? = (GF = s. 4 
P» Example 5С. Suppose that the digit transmitted through the communi- 
cations system described in example 5B (with p — 3) is chosen by a chance 
mechanism; digit 0 is chosen with probability j and digit 1, with probability 
3. What is the conditional probability that a digit transmitted by the third 


stage as 0 in fact entered the system as 0? 
Solution: The conditional probability that a digit transmitted by the 


third stage as 0 entered the system as 0 is given by 


Pi(0)P,(0, 0) 
(5.22) EC 


To justify (5.22), note that P,(0)P3(0, 0) is the probability that the first 
digit is 0 and the fourth digit is 0, whereas p,(0) is the probability that the 
fourth digit is 0. Now, under the assumption that 


P0, 1)=P0,0=} А0) =, n0-i 


it follows from (5.10) that 
1 — P(1, 1) 
(5.23) p0) = [ло — L3 Ane 0) + P(1, 1) — 1} 
1 — Р(1, 1) 
* 2-00) — PUD 


= (р 00-0 +3 = 81 
From (5.21) and (5.23) it follows that the conditional probability that а 
digit leaving the system as 0 entered the system as 0 is given by 
pg = 8. < 
A t us use the considerations of examples 3B and 5C 
gud cd ola first proposed by the celebrated cosmologist 


t ing problem, 
оа iei te “The Problem ofn liars and Markov chains," 


A. S. Eddington (see W. Feller, k 
American Mathemdiieal Monthly, Vol. 58 (1951), рр. 606-608). “If A, B, 
C, D each speak the truth once in 3 times (independently), and A affirms 


that B denies that C declares that D is a liar, what is the probability that 


D was telling the truth?” Rn 

Solution: We consider a sample description space of 4-tuples (г, £y 35 24) 
in which z, equals 0 or 1, depending on whether D is truthful or a liar, 22 
equals 0 or 1, depending on whether the statement made by C implies that 


134 INDEPENDENCE AND DEPENDENCE CH. 3 


D is truthful or a liar, z, equals 0 or 1, depending on whether the statement 
made by B implies that D is truthful or a liar, and 24 equals 0 or 1, 
depending on whether the statement made by А implies that D is truthful 
oraliar. The sample description space thus defined constitutes a series of 
Markov dependent repeated Bernoulli trials with 


Р(0, 0) = Р(1,1) =. А0) =, p=. 


The persons 4, В, C, and D can be regarded as forming a communications 
system. We are seeking the conditional probability that the digit entering 
the system was 0 (which is equivalent to D being truthful), given that the 
digit transmitted by the third stage was O (if А affirms that В denies that C 
declares that D is a liar, then A is asserting that D is truthful). In view 
of example 5C, the required probability is 12. 4 


Statistical Equilibrium. For large values of k the values of р„(5) and 
P:(f) are approximately given by 


TENES; 
Puls) = y Hs3— RP 
(5.24) 
> 1 — P(s, s) 
PAS) = 


2— P(s,s) — P(f,f)' 

To justify (5.24), use (5.10) and (5.11) and the fact that (5.9) implies that 
lim [P(s, s) + P(f, f) — 13 = 0, 
k—- oo 


Equation (5.24) has the following significant interpretation: After a 
large number of Markov dependent repeated Bernoulli trials has b 
formed, one is in a state of statistical equilibrium, in the sense that th 


e proba- 
bility ps) of success on the kth trial is the same for all large values of k 
and indeed is functionally independent of the initial conditions, represented by 
Pils). 


From (5.24) one sees that the trials 
that approximately 


(5.25) 


een per- 


are asymptotically fair in the sense 


рз) = PAS) = 4 
for large values of k if and only if 


(5.26) P(s, s) = PCY, f). 
pj» Example 5E. If the communications s 
consists of a very large number of stages, 


by the system will be 0’s, irrespective of t 
digits entering the system. 


ystem described in example 5B 
then half the digits transmitted 
he proportion of 0's among the 


SEC. 5 MARKOV DEPENDENT BERNOULLI TRIALS 135 


51. 


525 


5.3. 


5.4. 


5.5. 


5.6. 


5.7. 


5.8. 


5.9, 


EXERCISES 


Consider a series of Markov dependent Bernoulli trials such that 
P(s,s) =}, PC. f) = 1. Find Pss, f), Pf, s). 


Consider a series of Markov dependent Bernoulli trials such that 
P(s,s) =4, PU) = b p(s) = 3. Find p), ps( f). 


Consider a series of Markov dependent Bernoulli trials such that 
P(s,s) = 4, PC f) = 3, p(s) = 4. Find the conditional probability of a 
success at the first trial, given that there was a success at the fourth trial. 


Consider a series of Markov dependent Bernoulli trials such that 
P(s,s) =}, PAAP = i. Find un pi). 
i-e eo 


If A, B, C, and D each speak the truth once in 3 times (independently), 
and A affirms that В denies that C denies that D is a liar, what is the 


probability that D was telling the truth? 


Suppose the probability is equal to p that the weather (rain or no rain) on 
any arbitrary day is the same as on the preceding day. Let p, be the 
probability of rain on the first day of the year. Find the probability рь 
of rain on the th day. Evaluate the limit of p, as n tends to infinity. 
Consider a game played as follows: a group of n persons is arranged in 
a line. The first person starts a rumor by telling his neighbor that the 
last person in line is a nonconformist. Each person in line then repeats 
this rumor to his neighbor; however, with probability p^90, he reverses 
the sense of the rumor as it is told to him. What is the probability that 
the last person in line will be told he is a nonconformist if (i) n = 5, 
(ii) n = 6, (iii) n is very large? 
Suppose you are confronted with 2 coins, А and В. You are to make л 
tosses, using the coin you prefer at each toss. You will be paid 1 dollar 
for each time the coin falls heads. Coin A has probability 3 of falling 
heads, and coin B has probability 1 of falling heads. Unfortunately, 
you are not told which of the coins is coin А. Consequently, you decide 
io toss the coins according to the following system. For the first toss 
you choose a coin at random. For all succeeding tosses you select the coin 
used on the preceding toss if it fell heads, and otherwise switch coins. 
What is the probability that coin A will be the coin tossed on the mth toss 
if (i) n = 2, (ii) n = 4, (iii) n = 6, (iv) nis very large? What is the proba- 
bility that the coin tossed on the nth toss will fall heads if (i) n —2, 
(ii) n = 4, (iii) t = 6, (iv) n is very large? Hint: On each trial let s denote 
the use of coin А and f denote the use of coin В. 
Г i ing wooed by a certain young man. The young 
A сер уб on in pus e ponte tud often. "t E is late on 1 date, 
нду ксы. vat time on the next date. If she is on time, then 
Um is 90 еш Fher being late on the next date. In the long run, 
ere is a 60% cha 
how often is she late? 


136 INDEPENDENCE AND DEPENDENCE CH. 3 


t people in a certain group may be classified into 2 categories 
cn РКЫ Ween and country ей: Republicans and Democrats; 
Easterners and Westerners; skilled and unskilled workers, and so on). 
Let us consider a group of engineers, some of whom are Easterners and 
some of whom are Westerners. Suppose that each person has a certain 
probability of changing his status: The probability that an Easterner will 
become a Westerner is 0.04, whereas the probability that a Westerner will 
become an Easterner is 0.01. In the long run, what proportion of the 
group will be (i) Easterners, (ii) Westerners, (iii) will move from East to 
West in a given year, (iv) will move from West to East in a given year? 
Comment on your answers. 


6. MARKOV CHAINS 


The notion of Markov dependence, defined in section 5 for Bernoulli 
trials, may be extended to trials with several possible outcomes. Consider 
n trials of an experiment with r possible outcomes Si 55, ..., 5, in which 
б> 2. Fork =1,2,...,nandj= 1,2,..., rlet А) be the event that 
on the kth trial outcome s; occurred. The trials are defined as Markov 
dependent if for any integer k from 1 to n and integers jis jo,..., j,, from 
1 to r, the events AY, 492, . . . , AU? satisfy the condition 


(6.1) PIAR | Age? ++ А00] = PLAQ? | Ape]. 


In discussing Markov deperident trials with r possible outcomes, it is 
usual to employ an intuitively meaningful language. Instead of speaking 
of n trials of an experiment with r possible outcomes, we speak of observing 
at n times the state of a system which has г possible states. We number the 
states 1, 2, . . . , (ог sometimes 0, 1,...,r — 1) and let A? be the event 
that the system is in state jat time k. If (6.1) holds, we say that the system 
is a Markov chain with r possible states. In words, (6.1) states that at any 
time the conditional probability of transition from one's present state to 
any other state does not depend on how one arrived in one's present state. 
One sometimes says that a Markov chain is a System without memory of 
the past. 


Now suppose that for any states ; and j the conditional probability 
(6.2) Рі, j) = conditional probability that the Markov chain 


is at time гіп state j, given that at time (; — 1) 
it was in state i, 


is independent of t. 


The Markov chain is then said to be homogeneous 
(or time homogeneo 


us). A homogeneous Markov chain with r states 


SEC. 6 MARKOV CHAINS 137 
corresponds to the notion of Markov dependent repeated trials with r 
possible outcomes. In this section, by a Markov chain we mean a homo- 


geneous Markov chain. 
The m-step transition probabilities defined by 


(6.3) P,,(i,) = the conditional probability that the Markov 
chain is at time ¢ + 7 in state j, given that at 
time z it was in state i, 
are given recursively in terms of P(i, j) by the system of equations, for m = 
2 Bs neg 
(6.4) Pp li, j) = Pall, DP, j) + Р„_(0, 2)PQ, Ј) 
oe + Peas DP) 


= S P, ali, OPK j). 
ELA 


To justify (6.4), write it in the form 
Г " " к 
PLA, , | AQ] = У PLAT? | APPA А00]; 
[en 


+1 
recall that 49 is the event that at time т the system is in state j. 
One may similarly prove 


(6.5) P, ) = X PG BP. 
k= 


The unconditional probabilities, 
(6.6) ^ p,(j)- the probability that at time n the Markov chain 
is in state j, 


are given for n > 2 in terms of the initial unconditional probabilities p;( j) 


by 
dis р) = Èp OPa) 


One proves (6.7) in exactly the same way that one proves (6.4) and (6.5). 


Similarly, one may show that 
n E 
(6.8) pK = У р„109Р(К, J): 
[2261 
The transition probabilities P(i,j) of а Markov chain with r states are 
best exhibited in the form of a matrix. 
Р(1, 1) P(1,2) P(1,3) БРЫ = 1) Р(1, r) 
Р(2, 1) P(2, 2) р(2,3) «PQr 1) PQ,r) 
(ey cutem I vanae Raton reip nc uv c даш 
P(r—1,r—1) Pr— lr) 


P(r—11) P(r—1,2) Fr— 1,3)... 
P(r, 1) P(r,2) pg3 — e Pr = 1) P(r,r) 


138 INDEPENDENCE AND DEPENDENCE CH. 3 


The matrix P is said to be an r х r matrix, since it has r rows and г 


columns. 
Given an m x r matrix А and an r x n matrix B, 


ау а a, Dy biz... bin 
A= а Az а, B ba boaz... bon 
Amı yy... amr ba bre бя 


we define the product С = AB of the two matrices as the m x n matrix 
whose element c;;, lying at the intersection of the ith row and the jth 
column, is given by 


(6.10) Ci; = anbi; + азр c + aj,b,; = Е ац. 
Ca 


It should be noted that matrix multiplication is associative; A(BC) = 
(AB)C for any matrices A, B, and C. 

If we define the m-step transition probability matrix Р„ of a Markov 
chain by 


Pall, 1) P,,2).. PL, r) 
Р„(2, 1) P,Q,2)... Pm(2s ғ) 


Pr, 1) Pr, 2) ... Per r) 


(6.11) P 


we see that (6.4) and (6.5) may be concisely expressed. 

(6.12) P= Р 4 PoP. m = 2,3, *, 

P» Example 6A. If the transition probability matrix P of a Markov chain 
is given by 


P= 


we we © 
Ф we en 


or © ен 


then the chain consists of three states, since P isa 3 х3 matrix, and the 
2-step and 3-step transition probability matrices are given by 


X 4 1 
Е 9 4l lis $ $ 
= - L 4 4 — 
— ə ah Р, = P, =|% 1 2 
XE 1 x 
Pig HE 


If the initial unconditional probabilities are assumed to be given by 


Pı = (р1(1), p). p) = G, 1, 2, 


SEC. 6 MARKOV CHAINS 139 
then 
Ра = (р2(1), рә(2), рз(3)) = ФР = (4. 3 з) 
Рз = (рз(1), Рз(2). рз(3)) = р\Р» = PP = ($8; 4 3 
Pa = Gy). ps2), р4(3)) = рРз = рзР = (3$. 11, 13). < 


We define a Markov chain with r states as ergodic if numbers 


Tis 75... . , 7, exist such that for any states i and j 
(6.13) lim P,,(i, j) = z;. 
m— © 


In words, a Markov chain is ergodic if, as m tends to оо, the m-step 
transition probabilities P,,(i,) tend to a limit that depends only on the 
final state j and not on the initial state 7. If a Markov chain is ergodic, 
then after a large number of trials it achieves statistical equilibrium in the 
sense that the unconditional probabilities p,(j) tend to limits 
(6.14) lim p,(J) = т, 

© 


n= 
which are the same, no matter what the values of the initial unconditional 
probabilities p,(j). To see that (6.13) implies (6.14), take the limit of both 
г 
sides of (6.7) and use the fact that У p(k) = 1. 
FK 

In view of (6.14), we call 7, 75, . . . , т, the stationary probabilities of the 
Markov chain, since these represent the probabilities of being in the various 


States after one has achieved statistical equilibrium. | 

One of the important problems of the theory of Markov chains is to 
determine conditions under which a Markov chain is ergodic. A discussion 
of this problem is beyond the scope of this book. We state without proof 


the following theorem. 
If there exists an integer m such that 
(6.15) P,,(i,j) > 0 for all states 7 and j, 
then the Markov chain with transition probability matrix P is ergodic. 


It is sometimes possible to establish that a Markov chain is ergodic 
without having to exhibit an m-step transition probability matrix P,,, all 
the entries of which are positive. Given two states / and j in a Markov 


chain, we say that оле can reach i from j if states i, i, . . . , iy exist such 
> 


that 
(6.16) 


Two states j and j are said t 
also j from į. The following 


0 < P, )Р(й, Re -Piya ix)P(y, j). 
o communicate if one can reach ї from j and 
theorem can be proved. 


140 INDEPENDENCE AND DEPENDENCE CH. 3 


If all states in a Markov chain communicate and if a state i exists such 
that P(i, i) > 0, then the Markov chain is ergodic. 


Having established that a Markov chain is ergodic, the next problem 
15 to obtain the stationary probabilities ;. It is clear from (6.4) that the 
stationary probabilities satisfy the system of linear equations, 


(6.17) ту = PK),  j2452,-5,r 
[т 


Consequently, if a Markov chain is ergodic, then a solution of (6.17) that 
satisfies the conditions 


(6.18) 7,20 for ji = 1,2,-**, r$ p = 1 
ji 


exists. It may be shown that if a Markov chain with transition probability 
matrix P is ergodic then the solution of (6.17) satisfying (6.18) is unique 
and necessarily satisfies (6.13) and (6.14). Consequently, to find the 
stationary probabilities, we need solve only (6.17). 


> Example 6B. The Markov chain considered in example 6A is ergodic, 
since P,(i,j) > 0 for all states i and j. To compute the stationary proba- 
bilities т, ту, тз, we need only to solve the equations 

Ug ma + brg 
(6.19) m = $m + $m 

тз = 87, + ds 


subject to (6.18). It is clear that 


7, = тә = m=} 


is a solution of (6.19) satisfying (6.18). In the long run, the states 1, 2, and 
3 are equally likely to be the state of the Markov chain. E | 


A matrix 


is defined as stochastic if the sum. of the entries in an 


y row is equal to 1; 
in symbols, А is stochastic if 


2 


T 
(6.20) aus огт... 
=1 


SEC. 6 MARKOV CHAINS 141 


The matrix А is defined as doubly stochastic if in addition the sum of the 
entries in any column is equal to 1; in symbols, А is doubly stochastic if 
(6.20) holds and also 5 


r 
(6.21) mde [057 РУ m 


It is clear that the transition probability matrix P of a Markov chain is 
stochastic. If P is doubly stochastic (as is the matrix in example 6A), then 
the stationary probabilities are given by 


(6.22) m= ms HT, =-, 


in which r is the number of states in the Markov chain. To prove (6.22) 
one need only verify that if P is doubly stochastic then (6.22) satisfies 


(6.17) and (6.18). 


p» Example 6C. Random walk with retaining barriers. Consider a straight 
line on which are marked off positions 0, 1, 2, 3, 4, and 5, arranged from 
left to right. A man (or an atomic particle, if one prefers physically 
significant examples) performs a random walk among the six positions by 
tossing a coin that has probability p (where 0 — p — 1) of falling heads 
and acting in accordance with the following set of rules: if the coin falls 
heads, move one position to the right, if at 0, 1, 2, 3, or 4, and remain at 5, 
ifat 5; ifthe coin falls tails, move one position to the left, if at 1,2, 3, 4, or 5, 
and remain at 0, if at 0. The positions 0 and 5 are retaining barriers; one 
cannot move past them. Іп example 6D we consider the case in which 
positions 0 and 5 are absorbing barriers; if one reaches these positions, 
the walkstops. The transition probability matrix of the random walk with 


retaining barriers is given by 


(6.23) = 


© © © oa 
ooo. Or 
© ow» ом o 
os ous о о 
© "ш ooo 
"uu оо бс 


а Е 


All states in this Markov chain communicate, since 


0 < PO, 1)P(1, 2)P(2, 3)PG, 4)P(4, 5)Р(5, 4)P(4, 3)P3, 2)PQ2, 1)P(1, 0). 


142 INDEPENDENCE AND DEPENDENCE сн. 3 


The chain is ergodic, since P(0, 0) — 0. To find the stationary probabilities 


то, Ts - - - » Tg We Solve the system of equations: 
To = 4% + Чт 
ту = Pro + фт» 
тә = pr, + qns 

6.24 

TEN тз = pr + qma 
та = prz + 475 
тұ = pr, + рту. 


We solve these equations by successive substitution. 
From the first equation we obtain 


qmi = Pro or т = : To- 

By subtracting this result from the second equation in (6.24), we obtain 
qT, = рту ог т = = my = (2) то 

4 


Similarly, we obtain 


йт, = pm, or 


E] 
© 
Il 


3 
© 
І 
as 
srs, st 
ктү; ' 
E] 
= 


— 
> 
E] 
S 


qm, = pm, oF 


а 
M 
Aq. SIDS хаш 
a 
ll 
emm s 


975 = pr, or 75 


ә 
ll 
~ 


кз 1% 


— 
EI 
3 
s 


To determine то, we use the fact that 


2 
emen es 


4 q 
1—(рд% . 
Tg ————_—_ if 
_| ° 1— (plq) deii 
| 67, ifp=q=}. 

We finally conclude that the stationary probabilities for the random walk 
with retaining barriers for j = 0, 1,..., 5 are given by 

aj 1= (0/9). 

EJ IL. f 
(625) m= C [qs 779 


1. 


6 ifp=q=}. E | 


SEC. 6 MARKOV CHAINS 143 


If a Markov chain is ergodic, then the physical process represented by 
the Markov chain can continue indefinitely. Indeed, after a long time 
it achieves statistical equilibrium and probabilities 7, ..., 7, exist of 
being in the various states that depend only on the transition probability 


matrix P. 
We next desire to study an important class of nonergodic Markov 


chains, namely those that possess absorbing states. A state / іп a Markov 
chain is said to be absorbing if P(j,i) = 0 for all states i + j, so that it is 
impossible to leave an absorbing state. Equivalently, a state j is absorbing 
if P(j,j) = 1. 

p> Example 6D. Random walk with absorbing barriers. Consider a straight 
line on which positions 0, 1, 2, 3, 4, and 5, arranged from left to right, are 
marked off. Consider a man who performs a random walk among the six 
Positions according to the following transition probability matrix: 


] 0 0 € 0 0 


q0p 000 
0q40p 00 
6.26 Р= 
mu 0040p0 
0004 0p 


00000! 


In the Markov chain with transition probability matrix P, given by (6.26), 
the states 0 and 5 are absorbing states; consequently, this Markov chain 
is called a random walk with absorbing barriers. The model of a random 
walk with absorbing barriers describes the fortunes of gamblers with 
finite capital. Let two opponents, A and B, have 5 cents between them. 
Let А toss a coin, which has probability p of falling heads. On each toss 
he wins a penny if the coin falls heads and loses a penny if the coin falls 
tails. For j = 0, ..., 5 we define the chain to be in state j if A has j 
cents, 4 

Given a Markov chain with an absorbing state j, it is of interest to 


Compute for each state i 

(627) ид) = conditional probability 
absorbing state j, given 
state i. 


of ever arriving at the 
that one started from 


We call u,(i) the probability of absorption in state /, given the initial state i, 


Since one remains in j if one ever arrives there. РЕР? — 
The probability u;(i) is defined on a sample description space consisting 


ofa countably infinite number of trials. We do not in this book discuss 


144 INDEPENDENCE AND DEPENDENCE CH. 3 


the definition of probabilities on such sample spaces. Consequently, we 
cannot give a proof of the following basic theorem, which facilitates the 
computation of the absorption probabilities ui). 

If j is an absorbing state in a Markov chain with states 1,42 УЯ r), then 
the absorption probabilities u(1), ..., иг) are the unique solution to the 
system of equations: 

иј) = 1 
(6.28) u(i) = 0, if j cannot be reached from i 


u(i) = > P(i, К)и (К), if j сап be reached from i. 
Ё=1 


Equation (6.28) is proved as follows. The probability of going from 
state i to state j is the sum, over all states k, of the probability of going 
from i to j via k; this probability is the product of the probability P(i, К) 
of going from i to k in one step and the probability и(К) of then ever 
passing from К to j. 


> Example 6E. Probability of a gambler's ruin. Let А and play the 
coin-tossing game described in example 6D. If A’s initial fortune is 3 


cents and В” initial fortune is 2 cents, what is the probability that A’s 
fortune will be 0 cents before it is 5 cents, and А will be ruined? 


Solution: For i —0,1,...,5 let u(i) be the probability that A's 
fortune will ever be 0, given that his initial fortune was i cents. In view of 


(6.28), the absorption probabilities u(i) are the unique solution of the 
equations 


u0) = 1 
ul) = дш(0) + pu(2) or шо) — uo(0)] = Pluo(2) — us(1)] 
uo(2) = quo(1) + pu(3) ог q[u(2) — u(1)) = Plu) — uQ2)] 
(6.29) u,(3) = quy(2) + puy(4) or q[u(3)— u«(2)] = p[uy(4) — и0(3)] 


Ho(4) = quo(3) + pu (5) ог q[u(4) — ug(3)] = pluy5) — us4)] 
и0(5) = 0. 


To solve these equations we note that, defining c = u,(1) — ио(0), 


м2) — u1) = pu) — wO] = “с 
P p 
2 
(3) — 02) = : [uo(2) — ий] = (9 
4) — (3) = a [u43) — w2)] = ( 3) а 


"(5 — «(4 = ( Н 9 — wy = ( 9) 


SEC. 6 MARKOV CHAINS 145 


Therefore, there is a constant c such that (since (0) = 1), 


и) — 1 +c 


м0) =2с+1+е 


uy(4) = ME T ME t (9: +1l+e 


u(5) = ME + (9): + ME + (9: +1+с 


To determine the constant с, we use the fact that u (5) = 0. We see that 


143020 ifp=q=} 


1 + (LE =0 ifpzq 


1 — (glp) 
so that 
c=-} ifp= q= i 
L= (41р) їр = 
анау TT | PTY 
1 — (41р)° 
Consequently, for i = 0,1, 5359. 
i Р 
(6.30) и) = 1— 3 ifp=q=% 


1—(gpY ip 
== шр речи 


the probability that A will be ruined, given his initial 


In particular, the pro 
fortune is 3 cents, is given by 


u(3) = È ifp=q=} 


(lp? — (41р)? _ др if А 


146 


6.1. 


6.2. 


6.3. 


6.4. 


6.5. 


6.6. 


INDEPENDENCE AND DEPENDENCE сн. 3 
EXERCISES 


Compute the 2-step and 3-step transition probability matrices for the 
Markov chains whose transition probability matrices are given by 


1 4 110 
© ғ- | 31 d) P=|} 40 
ИЕ 0 1] 
TE 110 
di) — P-2|1 4 of, (jv) P=/o 1 
001 то 


For each Markov chain in exercise 6.1, determine whether or not (i) it is 
ergodic, (ii) it has absorbing states. 


Find the stationary probabilities for each of the following ergodic Markov 


chains: 
1 0.99 | 
ib Gi’) 
;] bo 0.99 


2 1 
(i) $ an (ii) [ 
3 3. 


coro олю 


Find the stationary probabilities for each of the following ergodic Markov 
chains: 
TE iid bid 
(i) iii (i оз i| (ii) 121 1% 
Pid ЖҮ bd ve 


Consider a series of independent Tepeated tosses of a coin that has proba- 
bility p > 0 of falling heads. Let us say that at time л we are in state 
51 52, Sa, OT 5, depending on whether Outcomes of tosses т — 1 and л were 


(H, Н), (H, T), (T, H), or (T, T). Find the transition probability matrix 
P of this Markov chain. Also find P?, рз ра, 


Random walk with retaining barriers. Consider 
positions 0, 1, 2,...,7 are marked off, Conside 


random walk among the positions according to 
probability matrix: 


a straight line on which 
r a man who performs a 
the following transition 


4 Рр 0 О0О 0000 
4 0р 0 о о о 0 
04 0рооо о 
Р= |0 0 4 0рооо 
00 0 оороо 
00005050 
900000520, 
990000 4 p 


Prove that the Markov chain is ergodic. Find the stationary probabilities. 


SEC. 


6.7. 


6.8. 


6.9. 


6 MARKOV CHAINS 147 


Gambler's ruin. Let two players А and B have 7 cents between them. 
Let А toss a coin, which has probability p of falling heads. On each toss 
he wins a penny if the coin falls heads and he loses a penny if the coin falls 
tails. If A’s initial fortune is 3 cents, what is the probability that A’s fortune 
will be 0 cents before it is 7 cents, and that А will be ruined. 


Consider 2 urns, I and II, each of which contains 1 white and 1 red ball. 
One ball is drawn simultaneously from each urn and placed in the other 
urn. Let the probabilities that after л repetitions of this procedure urn I 
will contain 2 white balls, 1 white and 1 red, or 2 red balls be denoted 
by Pns Jn, and rn, respectively. Deduce formulas expressing ру, 4+1, and 
Taiz in terms of рл, qs, and ry. Show that p,, q,, and r, tend to limiting 


values as 7 tends to infinity. Interpret these values. 
In exercise 6.8 find the most probable number of red balls in urn I after 
(i) 2, (ii) 6 exchanges. 


CHAPTER 4 


Numerical-Valued 


Random Phenomena 


In the foregoing we have considered mainly random phenomena whose 
sample description spaces were finite. We next consider random phenomena 
for which this is not necessarily the case. The simplest example of a 
random phenomenon whose sample description space is not necessarily 
finite is one which is numerical valued. The height of waves on a wind- 
swept sea, the number of alpha particles emitted from a radioactive source, 
the number of telephone calls arriving at a switchboard, the velocity of a 
particle in Brownian motion, the scores of students on an examination, 
the collar sizes of men, the dress sizes of women, and so on, constitute 
examples of numerical-valued random phenomena. In this chapter we 


discuss the notions and techniques used to treat numerical-valued random 
phenomena. 


1. THE NOTION OF A NUMERICAL-VALUED 
RANDOM PHENOMENON 


To introduce the notion of a numerical-valued random phenomenon, 
let us first consider a random phenomenon whose sample descrip- 
tion space S is a set of real numbers; for example, the number of 
white balls in a sample of size п drawn from an urn or the number 
of hits in п independent throws of a dart. For the sample descrip- 
tion space of each of these random phenomena one may take the set 
10, 1,2,...,п}. However, it has already been indicated (in section 
3 of Chapter 1) that one may make the sample description space S 
as large as one pleases, at the price of having a large number of sample 

148 


SEC. 1 THE NOTION OF A RANDOM PHENOMENON 149 


descriptions in S to which zero probability is assigned. Consequently, we 
may take for the sample description space of these phenomena the set of 
all real numbers from оо to co. The advantage of this procedure might 
be that it would render possible a unified theory of random phenomena 
whose sample description spaces are sets of real numbers. 

There is still another advantage. Suppose one is measuring the weight 
of persons belonging to a certain group. One may measure the weight to 
the nearest pound, the nearest tenth of a pound, or the nearest hundredth 
of a pound. In the first case the space S = {real numbers v: = k for 

2 ., 104} would suffice as the sample description 


some integer k = 0, 12:55 
Space; in the second case S = {real numbers 2: 2 = k/10 for some integer 


k =0,1,2,..., 105} would suffice; in the third case S = {real numbers 
x: ж = k/100 for some integer k = 0, l, 2, . - . , 10°} would suffice. Never- 
theless, it might be preferable in all three cases to take as one’s sample 
description space the set of all numbers from — 20 to со and to develop 
the difference between the three cases in terms of the different probability 
functions adopted to describe the three random phenomena. 

We are thus led to define the notion of a numerical-valued random 
phenomenon as a random phenomenon whose sample description space is 
the set R, consisting of all real numbers from — 00 to 00. The set R may be 
represented geometrically by а real line, which is an infinitely long line on 
which an origin and a unit distance have been marked off; then to every 
point on the line there corresponds a real number and to every real number 
there corresponds a point on the line. КЕ 

We have previously defined an event as a set of sample descriptions; 
consequently, events defined on numerical-valued random phenomena are sets 
of real numbers. However, not every set of real numbers can be regarded 
as an event. There are certain sets of real numbers, defined by exceedingly 
involved limiting operations, that are nonprobabilizable, in the sense that 
for these sets it is not in general possible to answer, in à manner consistent 
with the axioms below, the question, “what is the probability that a given 
numerical-valued random phenomenon will have an observed value in the 
Set?" Consequently, by the word "event" we mean not any set of real 
numbers but only a probabilizable set of real numbers. We do not possess 
at this stage in our discussion the notions with which to characterize the 
sets of real numbers that are probabilizable. We can point out only that it 
may be shown that the family (call it 7) of probabilizable sets always has 


the following properties: 

(i) To ,2 belongs any interv 
form (x: a < ж <b}, (ea < * 
in which a and 5 may be finite or 

(ii) To ¥ belongs the complement 4° 


al (an interval is a set of real numbers of the 
< bj, {era <u <5}, or (z:a X x X b) 
infinite numbers). 

of any set А belonging to F. 


150 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


E 
(iii) To F belongs the union U A, of any sequence of sets 4}, 4»... 
A... belonging to 2. set | 

If we desire to give a precise definition of the notion of an event at this 
stage in our discussion, we may do so as follows. There exists a smallest 
family of sets on the real line with the properties (1), (ii), and (iii). This 
family is denoted by 2, and any member of 2 is called a Borel set, after 
the great French mathematician and probabilist Emile Borel. Since 2 is 
the smallest family to possess properties (i), (11), and (iii), it follows that @ 
is contained in 2, the family of probabilizable sets. Thus every Borel set 
is probabilizable. Since the needs of mathematical rigor are fully met by 
restricting our discussion to Borel sets, in this book, by an "event" 
concerning a numerical-valued random phenomena, we mean a Borel set of 
real numbers. 

We sum up the discussion of this section in a formal definition. 

A numerical-valued random phenomenon is a random phenomenon whose 
sample description space is the set A (of all real numbers from — oo to oo) 
on whose subsets is defined a function P[], which to every Borel set of real 
numbers (also called an event) Е assigns a nonnegative real number, 
denoted by P[£], according to the following axioms: 


AXIOM 1. P[E] > 0 for every event E. 
AXIOM 2. P[R] = 1. 


AXIOM 3. For any sequence of events Ej, Ej... Ep.. 
mutually exclusive, 


. which is 


Р [ Ü Е} = SPIE; 


n=1 
> Example 1A. Consider the random phenomenon that consists in 
observing the time one has to wait for a bus at a certain downtown bus 
stop. Let A be the event that one has to wait between 0 and 2 minutes, 
inclusive, and let B be the event that one has to wait between 1 and 3 
minutes, inclusive. Assume that P[A] = 2, P[B] = 3, P[AB] = 1. We сап 
now answer all the usual questions about the events 4 and B. The con- 
ditional probability P[B | A] that B has occurred given that A has occurred 
is 8. The probability that neither the event А nor the event B has occurred 
is given by Р[4°В'] = 1 — P[A U B] = 1 — Р[А] — P[B] + P[AB] = 3. 4d 


EXERCISE 


1.1. Consider the events А and В defined in example 1A. Assuming that 

Р[А] = РІВ] = 5, P[AB] = 1, find the probability for k =0,1,2, that 
(i) exactly k, (ii) at least К, (iii) no more than К of the events А and B will 
occur. 


SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 151 


2. SPECIFYING THE PROBABILITY FUNCTION OF A 
NUMERICAL-VALUED RANDOM PHENOMENON 


Consider the probability function P[:] of a numerical-valued random 
phenomenon. The question arises concerning the convenient ways of 
stating the function without having actually to state the value of P[E] for 
every set of real numbers £. In general, to state the function Р[], as with 
any function, one has to enumerate all the members of the domain of the 
function РГ], and for each of these members of the domain one states the 
value of the function. In special circumstances (which fortunately cover 
most of the cases encountered in practice) more convenient methods are 


available. 
For many prob 
all real numbers x, from which P[£] 


integration: 
Q.1) P[E] = JO dx. 


ability functions there exists a function /(:), defined for 
can, for any event £, be obtained by 


Given a probability function РГ], which may be represented in the form 
of (2.1) in terms of some function f(-), we call the function f(-) the proba- 
bility density function of the probability function Р], апа ме ѕау that the 
probability function P[]is specified by the probability density function f(). 

A function /(") must have certain properties їп order to be a probability 
density function. To begin with, it must be sufficiently well behaved as a 
function so that the integral* in (2.1) is well defined. Next, letting E — К 


in (2.1), : А 
(2.2) 1 = P[R] = |ә ах = [o dx. 


* We usually assume that the integral in (2.1) is defined in the sense of Riemann; 
to ensure that this is the case, we require that the function f () be defined and continuous 
at all but a finite number of points. The integral in (2.1) is then defined only for events 
E, which are either intervals or unions of a finite number of nonoverlapping intervals. 
In advanced probability theory the integral in (2. 1) is defined by means ofa theory of in- 
tegration developed in the early 1900's by Henri Lebesgue. The gui must then 
be a Borel function, by which is meant that for any real number c the set AO «c 
is a Borel set. A function that is continuous at all but a finite number of points may 
be shown to be a Borel function. It may be shown that if a Borel function f () satisfies 
(2.2) and (2.3) then, for any Borel set B, the integral off C) over B exists asan integral 
defined in the sense of Lebesgue. If Bis an interval, or a union of a finite number of 
nonoverlapping intervals, and if f () is continuous on B, then the integral of f(-) over B, 
defined in the sense of Lebesgue, has the same value as the integral of fO over B, 
defined in the sense of Riemann. Henceforth, in this book the word function ae 
otherwise qualified) will mean a Borel function and the word set (of real numbers) wi 


mean a Borel set. 


152 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


f(x) f(x) 


— 


x x 
Exercise 2.1(i) Exercise 2.2(i) 


f(x) 


dis 


Exercise 2.1(ii) 


Exercise 2.2(ii) 
f(x) 


f(x) 


"dc а 


Exercise 2.1(iii) 


f(x) Exercise 2.2(iii) 


f(x) 


Exercise 2.3(iii) x 


Exercise 2.3(i) 
f(x) 


Exercise 2.4(ii) 


Exercise 2.3(ii) 


Кх) f(x) 


Exercise 2.5(iv) 


Exercise 2.4(i) 


Fig. 2A. Graphs of the probability density functions given in the exercises indicated. 


SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 153 


It is necessary that /(-) satisfy (2.2); in words, the integral of f(-) from —oo 
to со must be equal to 1. 

A function f (+) is said to be a probability density function if it satisfies (2.2) 
and, in addition,* satisfies the condition 
(2.3) Га) 20 for all 2: іп R, 
since a function f(:) satisfying (2.2) апа (2.3) is the probability density 
function of a unique probability function P[], namely the probability 
function with value P[E] at any event E given by (2.1). Some typical 
probability density functions are illustrated in Fig. 2A. 
P» Example 2A. Verifying that a function is a probability density function. 
Suppose one is told that the time one has to wait for a bus on a certain 
street corner is a numerical-valued random phenomenon, with a probability 
function, specified by the probability density function f(:), given by 
(2.4) f(a) = 40 — 22? —1 0<:=:<2 
—0 otherwise. 
gative for various values of x; in particular, it is 
negative for 0 < = < 4 (prove this statement). Consequently, it is not 
possible for /(:) to be a probability density function. Next, suppose that 
the probability density function f(-) is given by 
(2.5) f(x) = 4s — 22° 0<=<2 

= 0 otherwise. 

is nonnegative (prove this statement). 


The function f(:) is ne; 


The function f(-), given by (2.5), 
However, its integral from — 00 to со, 


[`у®&=$, 


nction /(:), given by (2.5) is not a 


is not 1to 1. Consequently the fu 
bebility der н the function f(-), given by 


probability density function. However, 
уе) = 2-20) 0<<2 
= 0 otherwise, 


is a probability density function. 4 
P» Example 2B. Computing probabilities from a probability density 
function. Letus consider again the numerical-valued random phenomenon, 
discuti i example TA, tt boosie? dhi observing Шова ba Aa 
ok we also require that a probability density function 


*F of this bo : 
ог (ће ригроѕеѕ t all but a finite number of points. 


f C) be defined and continuous а 


154 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 
wait for a bus at a certain bus stop. Let us assume that the probability 
function P[] of this phenomenon may be expressed by (2.1) in terms of the 
function f(-), whose graph is sketched in Fig. 2B. An algebraic formula for 
fC) can be written as follows: 
(2.6) f(x) =0 fors < 0 

= (0) (= + 1) for0<2<1 

= (@(@—(@) гі <= < (8) 

= (000) – =) for@)<«<2 

= (0)(4 — 2) for2<2%<3 


= (2) for3<2<6 
=0 for6 <z 
Ау = f(x) 
1 
4/9 — 
2/9 |- 
1/9 
є П L | P 
-1 0 1 2 3 4 5 6 7 к 
Fig. 2B. Gra 


ph of the probability density function f (-) 


defined by (2. 


From (2.1) it follows that А = 


(5:0 <a <2}and B= (z:1 <s <3} 
then 


" | 
РА] =| Ге) =} —P- | "fedex i, 
1 
Рав =|" fee) ds = 4, 


which agree with the values assumed in example 1A. 4 


SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 155 


P» Example 2C. The lifetime of a light bulb. Consider the numerical- 
valued random phenomenon that consists in observing the total time a 
light bulb will burn from the moment it is first put into service. Suppose 
that the probability function P[-] of this phenomenon is expressed by (2.1) 
in terms of the function /(-) given by 


Га) =0 for x <0. 


A, e! —7/1000) for ж > 0. 


= 1000 


Let E be the event that the bulb burns between 100 and 1000 hours, 
inclusive, and let F be the event that the bulb burns more than 1000 hours. 
The events E and F may be represented as subsets of the real line: Е = 
{2:100 < ж < 1000} and F = {x: 1000 < x}. The probabilities of E and 


F are given by 


1000 1 1000 —- — noo 10% 
кш 100 учак Bohn p Bec eg n 


Lg — ет! = 0.537. 


—(z/1000) qd = — e~ (211000) | 


© 1 eo 
PIF] =[„/® a= 1000 n 1000 


-e- 0.368. 4 


For many probability functions there exists a function p(-), defined for 
all real numbers v, but with value р(®) equal to 0 for all x except for a 
finite or countably infinite set of values of z at which p(x) is positive, such 
that from p(-) the value of P[E] can be obtained for any event E by 


summation: 


en "ns P pe) 
oints хіп E 
such that p(z) > 0 


In order that the sum in (2.7) may be meaningful, it suffices to impose the 
condition [letting E = R in (2.7)] that 

= x 
e l su ГА ) 


points z in R 
such that p(z) ^0 
ili i B i be represented in the form 
i bility function P[-], which may presa 
Pet ie oc pO the probability mass function of the proba- 
bility function P[], and we say that the probability function P[] is 
Specified by the probability mass function p(). 


156 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


A function p(), defined for all real numbers, is said to be a probability 
mass function if (1) p(x) equals zero for all x, except for a finite or countably 
infinite set of values of ж for which p(x) > 0, and Gi) the infinite series in 
(2.8) converges and sums to 1. Such a function is the probability mass 
function of a unique probability function P[-] defined on the subsets of 


the real line, namely the probability function with value P[E] at any set E 
given by (2.7). 


p> Example 2D. Computing probabilities from a probability mass function. 
Let us consider again the numerical-valued random phenomenon considered 
in examples ІА and 2B. Let us assume that the probability function P[:] 


р(х) 
А 


1/9 


ваи КАШИ 


LI 
03 09 15 21 27 33 39 45 51 5] ^" 


Fig. 2C. Graph of the probability mass function defined 
by Q.9). 


of this phenomenon may be expressed by (2.7) in terms of the function 
JC), whose graph is sketched in Fig. 2C. An algebraic formula for р(:) can 


be written as follows: 
(2.9) p(z) = 0, ^ unless x = (0.3)k for some k = 0,1,***,20 


2a for z = 0, 0.3, 0.6, 0.9, 2.1, 2.4, 2.7, 3.0 
> for 2 = 1.2, 1.5, 1.8 


1 
9 
35 for ж = 3.3, 3.6, 3.9, 4.2, 4.5, 4.8, 5.1, 5.4, 5.7, 6.0. 


30» 
It then follows that 


PIA] = p(0) + (0.3) + p(0.6) + p(0.9) + p(1.2) + р(1.5) + p(1.8) 


P[B] = йез + Р@.5) + р(1.8) + pQ.1) + p(2.4) + р(2.7) + р(3.0) 


P[AB] = p12) + р(1.5) + p(1.8) = 4, 


which agree with the values assumed in example 1A. 


4 


SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 157 


The terminology of “density function" and “mass function" comes from 
the following physical representation of the probability function Р[] of a 
numerical-valued random phenomenon. We imagine that a unit mass of 
some substance is distributed over the real line in such a way that the 
amount of mass over any set B of real numbers is equal to Р[В]. The 
distribution of substance possesses a density, to be denoted by f(x), at the 
point z, if for any interval containing the point x of length / (where / is a 
sufficiently small number) the weight of substance attached to the interval 
is equal to f(x). The distribution of substance possesses a mass, to be 
denoted by р(х), at the point x, if there is a positive amount p(x) of substance 


concentrated at the point. 
We shall see in section 3 that a probability function P[-] always possesses 


nction and a probability mass function. Con- 
probability function to be specified by either its 
ts probability mass function, it is necessary 
sufficient) that one of these functions 


a probability density fu 
sequently, in order for a 
probability density function or i 
(and, from a practical point of view, 
vanish identically. 


EXERCISES 


Verify that each of the functions f (°), given in exercises 2.1-2.5, isa probability 
density function (by showing that it satisfies (2.2) and (2.3)) and sketch its graph." 


Hint: use freely the facts developed in the appendix to this section. 
foro <z <1 


2.1. (i 
^ uu elsewhere. 
= T forü <x <2 
j -—[-—3 
D : p | elsewhere. 
(iii) t fr0 <x <1 
2 — 3(2 — 1)*) for! <2 <2 


for2 <x <3 


D? — 3(@ — 1} + 3@ — 2)9) 
elsewhere. 


1 
2.2, (i) f(z) = "s forü <# <1 
25 : elsewhere. 
ii for0 <2 <1 
d elsewhere. 
iii for |x| € 1 
me elsewhere. 


cises of this book. When a 
е analytic expression for all x in —oo < ж < ©, the 


d оо is not explicitly indicated. 


* The reader should note the convention used in the exer 


function f(-) is defined by а singl 
fact that x varies between —© an 


158 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


1 | < 
25. @ /@ -—-—z for |x| < 1 
=6 elsewhere. 
2 1 Р 1 
Gi) f@ = 3 же for0 «x» < 
=0 elsewhere. 
. 1 1 
Gii) f@= Pra 
: 1 gh 
(iv) /@ = A (1 + 5) 
= о-= 2 > 0 
24. (i) f(z) - Е ? se) 
G) fæ) = Qe! 
(i) уе) = Y. 
; йы cm. 
(iv) fG) "ales 
25. à) f(a) = P 
È s d (Y 
G) — f()- Ed 
Gi) уе) = = ei fora > 0 
тта; 
= 0 elsewhere. 
(iv)  f(z) = 1xe-2? fors >0 
-0 elsewhere. 


Show that each of the functions 
bility mass function [by showing 
Hint: use freely the facts develope 


РС) given in exercises 2.6 and 2.7 is a proba- 
that it satisfies (2.8)], and sketch its graph. 
d in the appendix to this section. 


2.6. (i) P(t) = 1 foras 20 

=$ fore = 1 
=0 otherwise. 

ка 6) (2\21 6-= 

Gi) р = (26) (5) T ENT. 
=й otherwise, 

зу 2 fs 

MET forz-1,2,, 
=0 otherwise, 

о 2s 

(iv) ps) = сше forz =0,1,2,--- 


=: otherwise. 


SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 159 


2.8. 


2.9. 


8 4 
(i) yo) ~ Wea) fors = 0; 1; 2,3, 4,5, 6 
(6) 
= 0 Ke otherwise. 
(ii) рб) = ( : B) б), for x = 0, 1, 2, ··· 
=0 otherwise. 
сүл —4 
(iii) pe) = ON {ог & = 0, 1,2; 3,4, 5, 6 
Cs) 
zd otherwise. 


The amount of bread (in hundreds of pounds) that a certain bakery is 
able to sell in a day is found to be a numerical-valued random pheno- 
menon, with a probability function specified by the probability density 


function f (), given by 


f) = Ax їог0<@<5 
= A(10 — x) for 5 <w < 10 
=0 otherwise. 


(i) Find the value of A which makes f(-) a probability density function. 


(ii) Graph the probability density function. 

(iii) What is the probability that the number of pounds of bread that will 
be sold tomorrow is (a) more than 500 pounds, (6) less than 500 pounds, 
(c) between 250 and 750 pounds? 

(iv) Denote, respectively, by A, B,and C, the events that the number of 
pounds of bread sold in a day is (a) greater than 500 pounds, (^) less than 
500 pounds, (c) between 250 and 750 pounds. Find P[A | B), P[A| С]. 
Are А and В independent events? Are A and C independent events? 
e (in minutes) that a certain young lady speaks on the 
d to be a random phenomenon, with a probability 
bility density function f'(), given by 


The length of tim 
telephone is foun 
function specified by the proba 
fe- Ae*l8 for x > 0 

= otherwise. 


(i) Find the value of A that makes /() a probability density function. 


(ii) Graph the probability density function. 

(iii) What is the probability that the number of minutes that the young 
lady will talk on the telephone is (a) more than 10 minutes, (b) less than 
5 minutes, (c) between 5 and 10 minutes? 

(iv) For any real number 6, let A(b) denote the event that the young lady 
talks longer than b minutes. Find P[A(b)]. Show that, for a > 0 and 
Б> 0, P[A(a + b) | 4(2)] = P[A()]. In words, the conditional proba- 
bility that a telephone conversation will last more than a + b minutes, 
given that it has lasted at least a minutes, is equal to the unconditional 
probability that it will last more than b minutes. 


160 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


2.10. The number of newspapers that a certain newsboy is able to sell in a day 
ia is found to be a numerical-valued random phenomenon, with a probability 
function specified by the probability mass function p(), given by 


p» = Ax forz =1,2,---, 50 
= A(100 — х) for x = 51, 52, ·· ·, 100 
=0 otherwise. 


(i) Find the value of A that makes poa probability mass function. 

(ii) Sketch the probability mass function. 

(iii) What is the probability that the number of newspapers that will be 
sold tomorrow is (а) more than 50, (5) less than 50, (c) equal to 50, 
(d) between 25 and 75, inclusive, (e) an odd number? 

(iv) Denote, respectively, by 4, B, C, and D, the events that the number 
of newspapers sold in a day is (a) greater than 50, (b) less than 50, (c) equal 
to 50, (4) between 25 and 75, inclusive. Find Р[А | B], P[A | C], PIA | D], 
P[C| D]. Are A and B independent events? Are A and D independent 
events? Are C and D independent events? 


2.11. The number of times that a certain piece of equipment (say, a light switch) 
operates before having to be discarded is found to be a random pheno- 
menon, with a probability function specified by the probability mass 
function PC), given by 

pE) = AQ)* fors =0,1,2,--- 
=0 otherwise. 


(i) Find the value of 4 which makes p(-) a probability mass function. 

(ii) Sketch the probability mass function. 

iii) What is the probability that the number of times the equipment will 
operate before having to be discarded is (a) greater than 5, (b) an even 
number (regard 0 as even), (c) an odd number? 

(iv) For any real number b, let A(b) denote the event that the number of 
times the equipment operates is Strictly greater than or equal to b. Find 
P[A(Q)). Show that, for any integers a > Qand b > 0, PLA(a + b) | А(а)] = 
P[A(b)]. Express in words the meaning of this formula. 


APPENDIX: THE EVALUATION OF INTEGRALS AND SUMS 


If (2.1) and (2.7) are to be useful expressions for evaluating the proba- 
bility of an event, then technique 


s must be available for evaluating sums 
and integrals. The purpose of this appendix is to state some of the notions 
and formulas with which the student should become familiar and to collect 
some important formulas that th 


€ reader should learn to use, even if he 
lacks the mathematical background to justify them. 
To begin with, let us note the following principle. Ifa function is defined 
by different analytic expressions over various regions, then to evaluate an 


SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 161 


integral whose integrand is this function one must express the integral as a 
sum of integrals corresponding to the different regions of definition of the 
function. For example, consider the probability density function /() 
defined by 


/@) = = Ѓог0 <= <1 
(2.10) =2— = forl<x<2 
=0 ~ elsewhere. 


To prove that /(:) is a probability density function, we need to verify that 
(2.2) and (2.3) are satisfied. Clearly, (2.3) holds. Next, 


[ro dx - [лә dx +[ лә dx + [ә ах 
- [pe de [ rte +0 
Е 


It might be noted that the function 
hat more concisely in terms of the 


2° 
2 


and (2.2) has been shown to hold. 
JC) in (2.10) can be written somew 
absolute value notation: 


Q.11) ра) = 1-11 4 


= 0 otherwise. 


fotüxcx2 


is command of the basic techniques of integra- 


Next, in order to check h 
owing formulas hold: 


tion, the reader should verify that the foll 


ех zi e T 
= а= — 4 ——— dr = tan? е° = arc tan e*, 
(1 +e 14e ЕРЕ 


(2.12) 


[= ах = fece stg 


a, obtained by integration by parts, is 


An important integration formul і 
t for which the integrals make sense: 


the following, for any real number 
Q.13) [| atle-tde = —ate* + (t— р du. 


Thus, for г = 2 we obtain 


Q.14) fe dx = —xe* «je dx = —е_*(ж + 1). 


162 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


We next consider the Gamma function I'(-), which plays an important 
role in probability theory. It is defined for every t > 0 by 


(2.15) г = | "atte ar, 
0 


The Gamma function is a generalization of the factorial function in the 
following sense. From (2.13) it follows that 


(2.16) I(r) = (2 — Tt — 1). 
Therefore, for any integer r 
(2.17) It + 1) 2 T(t) = (t — D) :::(( — Ares р). 
Since, clearly, T(1) = 1, it follows that for any integer n 
(2.18) Г(п +1) = п! 
Next, it тау be shown that for any integer n 

1 [23s Since = 
(2.19) r(n +) sant й Nod 

2 2" 
which may Бе written for any even integer п 

n+1 1:3:5:-* (i — T) 

(2.20) r( Я ) Е Jus Vs, 
since 
(2.21) TQ) = ws. 


We prove (2.21) by showin 
whose value we have need. In 


and let г = (n + 1)/2. Then, 


g that T(3) is equal to another integral of 
(2.15), make the change of variable x = 10°, 
for any integer, n = 0), 1,... , we have the 


formula 
H 1 1 e 
(2.22) r41) = xcu h ye V? dy, 


In view of (2.22), to establish (2.21) we need only show that 


1 a ce І фе - 
r() == f m que [| —% = 
(2.23) 5) =У? „^^ dy з)” = Мт. 
We prove (2.23) by proving the following basic formula; for any и 2 0 


1 FP z 1 
2.24 =| ee dy e В 
itis Ул 3-ю — 7 


SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 163 


Equation (2.24) may be derived as follows. Let / be the value of the integral 
in (2.24). Then 7? is a product of two single integrals. By the theorem for 
the evaluation of double integrals, it then follows that 


(2.25) P= x | | exp [—3u(a? + y?)] dx dy. 
T J —o -%0 

We now evaluate the double integral in (2.25) by means of a change of 

variables to polar coordinates. Then 


Br fs = ouis І 
P= E | e "y ага -Í e emp dr = -, 
27 Јо Jo 0 u 


so that J = 1/V/u, which proves (2.24). 

For large values of ¢ there is an important asymptotic formula for the 

Gamma function, which is known as Stirling’s formula. Takingt =n + 1, 
in which л is a positive integer, this formula can be written 

1 1 r(n) 

logn! = (x + Н loggn—n- 5106 27 + Ton’ 


n n 
nes () Zaner "2n, 
е 


in which (п) satisfies 1 — 1/(12n + 1) < (л) < 1. The proof of Stirling’s 
formula may be found in many books. A particularly clear desisvation is 
given by H. Robbins, “A Remark on Stirling’s Formula,” American 
Mathematical Monthly, Vol. 62 (1955), рр. 26-29. | 

We next turn to the evaluation of sums and infinite sums. The major tool 
in the evaluation of infinite sums is Taylor's theorem, which states that 
under certain conditions a function g(x) may be expanded in a power series: 


(2.26) 


а tt ano 
(2.27) ge) = > gs O 
in which (0) denotes the value at « = 0 of the kth derivative g(a) of 
g(x). Letting g(x) = e", we obtain 


Bg Og ене, aptari вё, 
(2.28) a Tet yt apt 3 
Take next g(x) = (1 — 2)", in which п = 1,2,.... Clearly 


(2.29) g(a) = (CDO — а) for penh an 
=0 fork >n. 
Consequently, for n = 1, 2, - +- 
$ ("|a  — 0 <= < оо, 
(2.30) (1 2)" = 2c» (i) Я оо <= 


к= 


164 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


which is a special case of the binomial theorem. One may deduce the | 
binomial theorem from (2.30) by setting z = (—b)/a. | 
We obtain an important generalization of the binomial theorem by 
taking g(x) = (1 — z)'; in which [15 any real number. For any real number 
t and any integer k = 1, 2, . . . define the binomial coefficient 


Ё (t — 1) (2—22 + 1) 
(2.31) () = El 
=] for k = 0. 


fork = 1,2,- 


Note that for any positive number n 
—n DG 1): (n k—10) d om. 
(2.32) ( k ) = (сут оке м = (—1)* M 


By Taylor's theorem, we obtain t 


he important formula for all real numbers 
гапа —] — a — 1, 


(2.33) ( — 2 2 y (' Jc. 
rao \k, 
For the case of n positive we may write, in view of (2.32), 
(2.34) а-а" = ў (" E = jas [к] — 1. 
k=0 


Equation (2.34), with n = 1 


» is the familiar formula for the sum of a 
geometric series: 


(2.35) оа Б Я 
Eo = = 


Equation (2.34) with п = 2 and 3 yields the formulas 


Йе RI ERN NN le] <1, 
(2.36 "^ =з) 

P т 

PX + 2k + Da (СС? |81 «1. 


From (2.33) we may obtain another im 


portant formula. By a comparison 
of the coefficients of z^ on both sides о; 


f the equation 
AU + а) = (1 4 a), 


we obtain for any real numbers s and t and any positive integer n 


e» (00) +0) +--+ (YQ) <6 49. 


SEC. 2 SPECIFYING THE PROBABILITY FUNCTION 165 


If 5 and t are positive integers (2.37) could be verified by mathematical 
induction. A useful special case of (2.37) is when s = ¢ = n; we then 
obtain (5.13) of Chapter 2. 


THEORETICAL EXERCISES 


2.1. Show that for any positive real numbers «, f, and f 
o a! 


(2.38) 
ай fo t\-8 
eg af —1е—%2 dy = (1 +- . 
TB) Jo Я 
22. Show for any ¢ > Оапіл = 1, 2,... 
(2.39) 2 | "ne Milo dy = gatar” i г). 
0 
2.3. The integral i 
(2.40) B(m, n) — [ am] — a)" de, 
0 


n are positive, defines a function of m and л, 
called the beta function. Show that the beta function is symmetrical in AS 
arguments, B(m, n) = В(п, m), and may be expressed [letting х = sin 0 
and х = 1/(1 + y), respectively] by 


which converges if m and 


тү? А 
(2.41) В(т, п) = aj sin?" 0 cos?" 0 40 

0 

© yn 


=), dre" 
Show finally that the beta and gamma functions are connected by the 


relation Г(т)Г(п) 
(2.42) Bon, = Tom + п)" 
we have 


Hint: By changing to polar coordinates, 


ofo 1,-23,2п—12—/° da dy 
Г(т)Г(п -af Í gamle eye y 
(n) TG) EI 


© 
; |I—] 
= af” d0 cos?" 0 sin?" af dret? romten- 
d 0 
0 


1, to prove (2.23). 


2.4. Use (2.41) and (2.42), with m =" = 
a function converges for any 


2.5. Prove that the integral defining the gamm 
real number f > 0. 


166 NUMERICAL-VALUED RANDOM PHENOMENA cH. 4 


2.6. Prove that the integral defining the beta function converges for any real 
numbers т and n, such that т» > 0 and n > 0. 


2.7 Taylor's theorem with remainder. Show that if the function gO) has a 
continuous nth derivative in some interval containing the origin then for 
æ in this interval 


a gra 


(2.43) g(x) = g(0) + zg'(0) + j£ 0 eh == ps" "© 
i ; n-1g (a; 
——- — Agat). 
| +] Nrg et) 
Hint: Show, for k = 2, 3, ..., n, that 
Е gU aq а= Pagi 1)? dt 
= Zafe (zA — t) etu Ts (a) — 
iG 
“Eon Аш 


2.8 Lagrange's form of the remainder in Taylor's theorem. Show that if gC) 


has a continuous nth derivative in the closed interval from 0 to x, where 
æ may be positive or negative, then 


1 
(2.44) [кеда = "di = 1 gX02) 
0 


for some number 0 in the interval 0 — 0 «1. 


3. DISTRIBUTION FUNCTIONS 


To describe completely a numerical-valued random phenomenon, one 
needs only to state its probability function. The probability function P[] 
is а function of sets and for this reason is somewhat unwieldy to treat 
analytically. It would be preferable if there were a function of points 
(that is, a function of real numbers x), which would suffice to determine 
completely the probability function. In the case of a probability function 
specified by a probability density function or by a probability mass function: 
the density and mass functions provide a point function that determines 
the probability function. Now it may be shown that for any numerical- 
valued random phenomenon whatsoever there exists a point function, 
called the distribution function, which suffices to determine the probability 
function in the sense that the probability function may be reconstructed 
from the distribution function. The distribution function thus provides а 


point function that contains all the information necessary to describe the 


probability properties of the random phenomenon. Consequently, es 
study the general properties o 


t ner f numerical valued random phenomena 
without restricting ourselves to those whose probability functions are 


SEC. 3 DISTRIBUTION FUNCTIONS 167 


specified by either a probability density function or by a probability mass 
function, it suffices to study the general properties of distribution functions. 

The distribution function F(-) of a numerical valued random phenomenon is 
defined as having as its value, at any real number x, the probability that an 
observed value of the random phenomenon will be less than or equal to the 
number x. In symbols, for any real number x, 


(3.1) F(x) = P[freal numbers x’: a' < 2}. 


Before discussing the general properties of distribution functions, let us 
consider the distribution functions of numerical valued random pheno- 
mena, whose probability functions are specified by either a probability 
mass function or a probability density function. If the probability function 
is specified by a ‘probability mass function p(-), then the corresponding 
distribution function F(:) for any real number = is given by 


(3.2) Fe)= X pre. 

EL UU 
Equation (3.2) follows immediately from (3.1) and (2.7). If the probability 
function is specified by a probability density function /(:), then the corre- 


Sponding distribution function F(-) for any real number 2 is given by 


(3.3) F(x) = Г. уе) de’. 


from (3.1) апа (2.1). 
random phenomena by classifying their 
h, consider a random phenomenon 


Equation (3.3) follows immediately 
We may classify numerical valued 


distribution functions. To begin witl s nc ! 
Whose probability function is specified by its probability mass function, 


so that its distribution function F() is given by (3.2). The graph y = Р(х) 
then appears as it is shown in Fig. 3A; it consists of a sequence of hori- 
zontal line segments, each one higher than its predecessor. The points at 
which one moves from one line to the next are called the jump points of the 
distribution function F(-); they occur at all points rat which the probability 
mass function p(x) is positive. We define a discrete distribution function 
as one that is given by а formula of the form of (3.2), in terms of a 
Probability mass function p(), or equivalently as one whose graph (Fig. 
3A) consists only of jumps and level stretches. The term discrete 
connotes the fact that the numerical valued random phenomenon corre- 
Sponding to a discrete distribution function could be assigned, as its 
of the (at most countably 


Sample description space, the set consisting (at mos [ 
infinite number of) points at which the graph of the distribution function 


jumps. 
Let us next consider а numerical valued random phenomenon whose 
Probability function is specified by а probability density function, so that 


CH. 


NUMERICAL-VALUED RANDOM PHENOMENA 


168 


U9AIS st (-)-7 Yoryar Jo suaz ш (-) f uonoung Киѕиор Aqiqeqosd Bu 
Jo pue (-).7 uonounj uonnquisip snonunuoo е jo udv1c) “ge "313 


8 4 


9 


v 


(кыба 


L G т=ш= x 


© 


Aq шәм st C) uou jo sway ut ()d uonounj sseur Ayytqeqoid 
ƏY} JO pue (ә) UoNoUNY uonnqusrp әуәләвтр v jo YdeIn “ye "814 


B Z Sg 8s v & c I= ve 


x 
= 


1 


+0 


т у re er | 


| 
4-90 | ү 
| 
+80 im 4-80 
| 
| 
way OT : AY ot 


20 
v0 


seror, OOO 


SEC. 3 DISTRIBUTION FUNCTIONS 169 


А F(x) 
10 = 


0.9} 


08r— 


0.7 Г 


0.6[— 


0.5 - 


0.4 = 
0.3 


0.2— 


1 2 3 4 oss 


-2 -1 0 

Fig.3C. Graph of a mixed distribution function. 
its distribution function F() is given by (3-3). The graph y = Р(х) then 
he function F(-) is continuous. 


appears (Fig. 3B) as an unbroken curve. T 
the derivative F(x) exists at all points 


Owever, even more is true; 
(except perhaps for a finite edi of points) and is given by 


(3.4) Р) = = 4 FQ) = f(a). 
We define а continuous distribution function as one that is given by a 

formula of the form of (3.3) in terms ofa probability density function. 
Most of the distribution functions arising in practice are either discrete 
tant to realize that there are 


ri continuous. Nevertheless, it is impor! 
istribution functions, such as the one whose graph is shown in Fig. 3С, 


170 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


that are neither discrete nor continuous. Such distribution functions are 
called mixed. A distribution function F(-) is called mixed if it can be 
written as a linear combination of two distribution functions, denoted by 
F*( and F*(), which are discrete and continuous, respectively, in the 
following way: for any real number x 


(3.5) F(x) = eF'(z) + coF (a), 


in which c, and c; are constants between 0 and 1, whose sum is one. The 
distribution function F(-), graphed in Fig. 3C, is mixed, since F(x) = 
&F'(z) + 3F*(x), in which F%(-) and F*(-) аге the distribution functions 
graphed in Fig. 3A and 3B, respectively. 

Any numerical valued random phenomenon possesses a probability 
mass function p(-) defined as follows: for any real number x 


(3.6) Р(®) = P[(real numbers x’: 2’ = xj]. 


Thus p(x) represents the probability that the random phenomenon will 
have an observed value equal to z. In terms of the representation of the 
probability function as a distribution of a unit mass over the real line, 
Р(®) represents the mass (if any) concentrated at the point х. It may be 
shown that p(x) represents the size of the jump at 2 in the graph of the 
distribution function F(-) of the numerical valued random phenomenon. 
Consequently, p(x) = 0 for all z if and only if F(-) is continuous. 

We now introduce the following notation. Given a numerical valued 
random phenomenon, we write Y to denote the observed value of the 
random phenomenon. For any real numbers а and b we write Pla < X x b 
to mean the probability that an observed value X of the numerical valued 
random phenomenon lies in the interval atob. It is important to keep in 
-A that P[a < X x LJ Tepresents an informal notation for P[(x:a <% < 

Some writers on probability theory call a number Y determined by 
the outcome of a random experiment (as is the observed value X of à 


menon can be regarded as a random variable. For 
nition: А 

? variable (or, equivalently, X is said 
al valued random phenomenon) if Jar 


a probability (which we denote by PLX 2 
that X is less than or equal to x 


Given an observed value Y of a numerical valued random phenomenon 


SEC. 3 DISTRIBUTION FUNCTIONS 171 


with distribution function F(-) and probability mass function p(-), we have 
the following formulas for any real numbers a and 5 (in which a < 5): 


Pla < X < b] = Pl{x:a < x x bj] = F(b) — F(a) 
(3.7) P|a < X < b] = Р[{ж:а < x <) = F(b) — F(a) + pla) 
Pla < X < b] = Р[{х:а < x < bj] = F(b) — F(a) + p(a) — pd) 
Pla < X < b] = Pl{x:a < x < bj] = F(b) — F(a) — p(b). 
To prove (3.7), define the events A, B, C, and D: 
A={X<a}, B={X<b, C-(X-a, D-(X-B. 
Then (3.7) merely expresses the facts that (since A C B, C C 4, D c В) 
P[BA*] = P[B] — P[A] 
(3.8) P[BA* U C] = РІВ] — PIA] + PIC] 
P[BA*D* U C] = РІВ] — PIA] + PIC] — PID] 
P[BA*DD*] = РІВ] — Р[А] — PID]. 
„Тһе use of (3.7) in solving probability problems posed in terms of 
distribution functions is illustrated in example 3A. 
> Example ЗА. Suppose that the duration in minutes of long distance 
telephone calls made from a certain city is found to be a random pheno- 
menon, with a probability function specified by the distribution function 
FC), given by 
G.9) Ба) =0  forz«0 
= 1 — fe) get fore > 0, 
efined for any real number y > 0 as the 
largest integer less than or equal to y. What is the probability that the 
duration in minutes of a long distance telephone call is (i) more than six 
minutes, (ii) less than four minutes, (iii) equal to three minutes? What is 
the conditional probability that the duration in minutes of a long distance 
telephone call is (iv) less than nine minutes, given that it is more than five 
Minutes, (v) more than five minutes, given that it is less than nine minutes? 
Solution: The distribution function given by (3.9) is neither continuous 
Nor discrete but mixed. Its graph is given in Fig. 3D. For the sake of 
brevity, we write X for the duration in minutes of a telephone call and 
P[X > 6] as an abbreviation in mathematical symbols of the verbal 


Statement “the probability that a telephone call has a duration strictly 
greater than six minutes.” The intuitive statement P[X > 6] is identified 
in our model with P[(z': z' > 6}], the value at the set iz: 2 > б) of the 
Probability function P[] corresponding to the distribution function F() 
Blven by (3.9). Consequently, 


PLY > 6) = 1 — FO) = 46° + 37 


in which the expression [y] is d 


21 = g-? = 0.135. 


172 NUMERICAL-VALUED RANDOM PHENOMENA сн. 4 


ili i ill belessthan four minutes 
xt, the probability that the duration of. acall wi r і 
= пору сарам written, Р[Х < 4]) is equal to F(4) — p(4), in which 
(4) is the jump in the distribution function F(-) at x = 4. A glance at the 
anh of F(-), drawn in Fig. 3D, reveals that the graph is unbroken at 
ж = 4. Consequently, p(4) = 0, and 


PIX < 4] = 1 — fe“) — getal = 1 — 46-09 _ 3971 = 0.684. 


4 F(x) 
10— 


08r 


06— 


0.4 | 


QUU A О Ж 40-358 єс Y —8 9 10 
Fig. 3D. Graph of the distribution function given by (3.9). 


The probability Р[Х = 3] that an observed value X of the duration of à 
call is equal to 3 is given by 


PIX = 3] = p) = (1 — 4e-99 — фе) — (1 — go — фе) 
= КІ — e) = 0.316, 
in which p(3) is the jump in the graph of F(-) at x = 3. Solutions to parts 
(iv) and (v) of the example may be obtained similarly: 
PIX <9] X> 5] =B < X <9) _ F9)-p9)-— F(5) 
P[X > 5] 1— F(5) 
— Me + e — ез езу _ 0.187 


eed 0610, 
Ke 99 e = 0279 7’ 
P[XY5|x«sgj- 7b < X—9| _ FO) — р(9) — F(s) 
P[X < 9] F(9) — p(9) 
QUE etary gum ne 
0-09-25 — = coon 0.206. 4 


In section 2 we gave the conditions a function must satisfy in order to be 


SEC. 3 DISTRIBUTION FUNCTIONS 173 


a probability density function or a probability mass function. The question 
naturally arises as to the conditions a function must satisfy in order to bea 
distribution function. In advanced studies of probability theory it is 
shown that the properties a function F(-) must have in order to be a 
distribution function are the following: (i) F() must be nondecreasing 
in the sense that for any real numbers a and b 
(3.10) F(a) < F(b) ifa< b; 
(ii) the limits of F(z), as x tends to either plus or minus infinity, must exist 
and be given by 
(3.11) lim К@=0, lim F@) = 1; 

z-.— 00 dro 


(iii) at any point x the limit from the right lim (b), which is defined as the 
b—r4 


limit of F(b), as b tends to x through values greater than z, must be equal 
to F(a), 
(3.12) lim F(b) = F() 


br 
ph of F(x) is unbroken as one approaches z 


the limit from the left, written F(a—) 
limit of F(a) as a tends to z through 


So that at any point = the рга 
from the right; (iv) at any point 2, 
or lim F(a), which is defined as the 


a-r-— 
values less than x, must be equal to F(x) — р(х); in symbols, 
G.13) Бае) = lim F(a) = F@) — p, 


ty that the observed value of the 


Where we define p(x) as the probabili 
ote that p(x) represents the size of 


random phenomenon is equal to z. N 
the jump in the graph of F(x) at v. . n. 
From these facts it follows that the graph y — F(a) of a typical distribu- 

0 and y = 1. The 


tion function F(-) has as its asymptotes the lines у= | 
Braph is nondecreasing. However, it need not increase at every point but 


Tather may be level (horizontal) over certain intervals. The graph need 
Not be unbroken [that is, F(-) need not be continuous] at all points, but 
there is at most a countable infinity of points at which the graph has a 
break; at these points it jumps upward and possesses limits from the 
right and the left, satisfying (3.12) and GID: a | 

The foregoing mathematical properties of the distribution function of a 
numerical valued random phenomenon serve to characterize completely 
Such functions. It may be shown that for any function possessing the first 
three properties listed there is à unique set function P[:], defined on the 
Borel sets of the real line, satisfying axioms 1-3 of section 1 and the con- 
dition that for any finite real numbers 4 and b, at which a < b, 


(3.14) P[(real numbers ж: а < * < b}] = РФ) — Fo. 


174 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


From this fact it follows that to specify the probability function it suffices 
to specify the distribution function. 

The fact that a distribution function is continuous does not imply that 
it may be represented in terms of a probability density function by a 
formula such as (3.3). If this is the case, it is said to be absolutely continuous. 
There also exists another kind of continuous distribution function, called 
singular continuous, whose derivative vanishes at almost all points. This 
is a somewhat difficult notion to picture, and examples have been con- 
structed only by means of fairly involved analytic operations. From a 
practical point of view, one may act as if singular distribution functions 
do notexist, since examples of these functions are rarely, if ever, encountered 
in practice. It may be shown that any distribution function may be 
represented in the form 


(3.15) F(x) = сЕ) + cF" (£) + cy F**(x), 


in which F/^(), F(-), and F*(-), respectively, are discrete, absolutely 
continuous, and singular continuous, and су, C} and c, are constants 
between 0 and 1, inclusive, the sum of which is 1. If it is assumed that the 
coefficient сз vanishes for any distribution function encountered in practice, 
it follows that in order to study the properties of a distribution function 
it suffices to study those that are discrete or continuous. 


THEORETICAL EXERCISES 


3.1. Show that the probability mass function p(-) of a numerical valued random 
phenomenon can be positive at no more than a countable infinity of 
points. Hint: For n = 2, 3, .. . , define E, as the set of points x at which 
p(x) > (1/n). The size of E, is less than n, for if it were greater than л it 
would follow that P[E,] > 1. Thus each of the sets E, is of finite size. 
Now the set E of points x at which p(x) > 0 is equal to the union Е, У 
Ез U... U E, U..., since p(x) > 0 if and only if, for some integer л, 
р(®) > (1[п). The set E, being a union of a countable number of sets of 


finite size, is therefore proved to have at most a countable infinity of 
members. 


EXERCISES 


3.1-3.7. For k = 1,2,..., 7, exercise 3.k is to sketch the distribution function 


corresponding to each probability density function or probability mass 
function given in exercise 2.k. 


3.8. In the game of “odd man out" (described in section 3 of Chapter 3) the 
number of trials required to conclude the game, if there are 5 players, 


SEC. 3 DISTRIBUTION FUNCTIONS 175 


3.9. 


3.10. 


is a numerical valued random phenomenon, with a probability function 
specified by the distribution function F(-), given by 


F(x) =0 fora <1 
=1-G)" fore >1, 


in which [2] denotes the largest integer less than or equal to a. 

(i) Sketch the distribution function. 

(ii) Is the distribution function discrete? If so, give a formula for its 
probability mass function. 

(iii) What is the probability that the number of trials required to conclude 
the game will be (a) more than 3, (b) less than 3, (c) equal to 3, (d) between 
2 and 5, inclusive. 

(iv) What is the conditional probability that the number of trials required 
to conclude the game will be (a) more than 5, given that it is more than 
3 trials, (b) more than 3, given that it is more than 5 trials? 


Suppose that the amount of money (in dollars) that a person in a certain 
social group has saved is found to be a random phenomenon, with a 
probability function specified by the distribution function F(-), given by 
F(x) = 4e- 62/50" forx <0 
=] 16-0150 ога 2 0. 


Note that a negative amount of savings represents а debt. 
(i) Sketch the distribution function. 

(ii) Is the distribution function continuou 
probability density function. 
Gii) What is the probability that the amount of savings possessed by a 
person in the group will be (a) more than 50 dollars, (5) less than —50 
dollars, (c) between —50 dollars and 50 dollars, (d) equal to 50 dollars? 
(iv) What is the conditional probability that the amount of savings 
possessed by a person in the group will be (a) less than 100 dollars, given 
that it is more than 50 dollars, (b) more than 50 dollars, given that it is 


less than 100 dollars? 


s? If so, give a formula for its 


at the duration in minutes of long-distance telephone calls made 
be a random phenomenon, with а proba- 


distribution function F(-), given by 
forx <0 
ge- — je- — forz > 0. 


Suppose th 
from a certain city is found to 
bility function specified by the 


F(x) =0 


(i) Sketch the distribution function. 
(ii) Is the distribution function continuous? Discrete? Neither? 

(iii) What is the probability that the duration in minutes of a long-distance 
telephone call will be (a) more than 6 minutes, (b) less than 4 minutes, 
(c) equal to 3 minutes, (d) between 4 and 7 minutes? | 

(iv) What is the conditional probability that the duration of a long- 
distance telephone call will be (a) less than 9 minutes, given that it has 


176 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


lasted more than 5 minutes, (5) less than 9 minutes, given that it has 
lasted more than 15 minutes? 


3.11. Suppose that the time in minutes that a man has to wait at a certain 
subway station for a train is found to be a random phenomenon, with a 
probability function specified by the distribution function F(-), given by 

F(x) =0 forz <0 
= $e foro <= <1 


= { fori <= x2 
—-iv for2<2<4 
= 1 for x > 4, 


(i) Sketch the distribution function. 

(ii) Is the distribution function continuous? If so, give a formula for its 
probability density function. 

(iii) What is the probability that the time the man will have to wait for a 
train will be (а) more than 3 minutes, (5) less than 3 minutes, (c) between 
1 and 3 minutes? 

(iv) What is the conditional probability that the time the man will have to 
wait for a train will be (а) more than 3 minutes, given that it is more than 
1 minute, (5) less than 3 minutes, given that it is more than 1 minute? 


3.12. Consider a numerical valued random phenomenon with distribution 


function 

F(x) =0 forz <0 
=(})@ for0 <= <1 
= { forl sz<2 
=( = (ог2<=<3 
= $ for3 <= «4 
=( л>  for4<2<8 
= for8 < =, 


What is the conditional probability that the observed value of the random)“, 


phenomenon will be between 2 and 5, given that it is between 1 and 6, 
inclusive, 


4. PROBABILITY LAWS 


The notion of the probability law of a random phenomenon is introduced 
in this section in order to provide a concise and intuitively meaningful 
language for describing the probability properties of a random pheno- 
menon. 

In order to describe a numerical valued random phenomenon, it is 
necessary and sufficient to state its probability function P[]; this is 
equivalent to stating for any Borel set B of real numbers the probability 


SEC. 4 PROBABILITY LAWS 177 


that an observed value of the random phenomenon will be in the Borel set 
B. However, other functions exist, a knowledge of which is equivalent 
to a knowledge of the probability function. The distribution function is 
one such function, for between probability functions and distribution 
functions there is a one-to-one correspondence. Similarly, between 
discrete distribution functions and probability mass functions and between 
continuous distribution functions and probability density functions one- 
to-one correspondences exist. Thus we have available different, but 
equivalent, representations of the same mathematical concept, which we 
may call the probability law (or sometimes the probability distribution) of 
the numerical valued random phenomenon. 

A probability law is called discrete if it corresponds to a discrete 
distribution function and continuous if it corresponds to a continuous 
distribution function. 

For example, suppose one is considering the numerical valued random 
phenomenon that consists in observing the number of hits in five indepen- 
dent tosses of a dart at a target, where the probability at each toss of hitting 
the target is some constant p. To describe the phenomenon, one needs to 
know, by definition, the probability function P[-], which for any set E of 


real numbers is given by 


(4.1) PIE] = 


5 


2. (i pq. 
k in E(0, 1,..., 5) 


It should be recalled that E(0, 1, . . . , 5} represents the intersection of the 


sets E and (0, 1,..., 5) MES — 
Equivalently, one may describe the phenomenon by stating its distribu- 


tion function F(-); this is done by giving the value of F(x) at any real 


number =, 
E 5 5—0 
(4.2) Аа) = > (Р: 
It should be recalled that [=] denotes the largest integer less than or equal 
to x. к ч . 

Equivalently, since the distribution function is discrete, one may 
describe the phenomenon by stating its probability mass function p(), 
given by 

aere fore =0,1,-.-05 
(4.3) рб) = (2) rr piging 


=0 otherwise. 


i ivalent representations, Or 
; 1), (4.2). and (4.3) constitute equiva’ р К 
M oo is n concept, which we call the probability law of the 
random phenomenon. This particular probability law is discrete. 


178 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


We next note that probability laws may be classified into families on the 
basis of similar functional form. For example, consider the function 
b(-; n, p) defined for any n = 1,2,...and0<p<1 by 


b(z;n,p)-— (") pa fore = 0, ],+++,лп 
= 0 otherwise. 


For fixed values of л and р the function 5(:; n, p) is a probability mass 
function and thus defines a probability law. The probability laws deter- 
mined by 5(:; n, ру) and b(-; п», Po) for two different sets of values пу, Py 
and л», p, are different. Nevertheless, the common functional form of the 
two functions A(-; nj, ру) and b(-; ns, p») enables us to treat simultaneously 
the two probability laws that they determine. We call п and p parameters, 
and A(-;n, p) the probability mass function of the binomial probability 
law with parameters л and р. 

We next list some frequently occurring discrete probability laws, to be 
followed by a list of some frequently occurring continuous probability laws. 

The Bernoulli probability law with parameter p, where 0 < p <1, is 
specified by the probability mass function 


(4.4) P(x) =p Ге =] 

=l-—p=q х= = 0 

=0 otherwise. 
An example of a numerical valued random phenomena obeying the 
Bernoulli probability law with parameter p is the outcome of a Bernoulli 
trial in which the probability of success is p, if instead of denoting success 
and failure by s and f, we denote them by 1 and 0, respectively. 
. The binomial probability law with parameters n and p, where л = 1, 
2,...,and0 X p « 1, is specified by the probability mass function 


(4.5) pla) = (rra —py-  forz—0,1,...,n 
=0 otherwise. 


An important example of a numerical valued random phenomenon obeying 
the binomial probability law with parameters n and p is the number of 
successes in n independent repeated Bernoulli trials in which the probability 
of success at each trial is P- 

The Poisson probability law with parameter A, where 2 > 0, is specified 
by the probability mass function 


RE 
(4.6) p) = е1 Го —0.1,2,.,. 


= 0 otherwise. 


SEC. 4 PROBABILITY LAWS 179 


In section 3 of Chapter 3 it was seen that the Poisson probability law 
provides under certain conditions an approximation to the binomial 
probability law. In section 4 of Chapter 6 we discuss random phenomena 
that obey the Poisson probability law. 

The geometric probability law with parameter p, where 0 < p < 1,158 
specified by the probability mass function 


(4.7) p(x) = pa — pr forz = 1,2,... 
=0 otherwise. 


An important example of a numerical valued random phenomenon obeying 
the geometric probability law with parameter p is the number of trials 
required to obtain the first success in а sequence of independent repeated 
Bernoulli trials in which the probability of success at each trial is p. 

The Aypergeometric probability law with parameters N, п, and p (where 


М may be any integer 1, 2,...,nisan integer in the set 1,2, ... , Nand 


p =0, 1/N,2/N,-.-> 1) is specified by the probability mass function, 


letting q = 1 — р, 
KI Nq ) 
(2) 74 fot EA A 


(4.8) p(x) = Г 
s) 
=0 otherwise. 
aw may also be defined by using (2.31), 
10<р<1. Ап example of a random 
eometric probability law is given by the 
number of white balls contained in à sample of size n drawn without. 
replacement from an urn containing N balls, of which Np are white. 
The negative binomial probability law with parameters r and p, where 
r—1,2,...andO <p <l, is specified by the probability mass function, 
letting q = 1 — P, 


(49) p(x) = ( T X re - (roy for æ =0,1,... 


=0 


An example of a random phenomeno 


probability law with parameters r andpis ber at 
in a sequence of independent repeated Bernoulli trials (with probability p 


of success at each trial) before the rth success. Note that the number of 
trials required to achieve the rth success 15 equal to r plus the number of 


failures encountered before the rth success is met. 


The hypergeometric probability 1 
for any value of p in the interva 
phenomenon obeying the hyperg 


otherwise. 


n obeying the negative binomial 
the number of failures encountered 


180 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


Some important continuous probability laws are the following. 

The uniform probability law over the interval a to 5, where a and 5 are 
any finite real numbers such that a < b, is specified by the probability 
density function 


1 
(4.10) Hes — їога<хж<Ь 


=0 otherwise. 


Examples of random phenomena obeying a uniform probability law are 
discussed in section 5. 

The normal probability law with parameters m and о, where —oo < 
m < co and c > 0, is specified by the probability density function 


€) лә е7 < 

i g= — e aikai —O00 «X со. 
oV 2g 

The role played by the normal probability law in probability theory is 

discussed in Chapter 6. In section 6 we introduce certain functions that 

are helpful in the study of the normal probability law. 

The exponential probability law with parameter 2, in which А > 0, is 
specified by the probability density function ‹ 
(4.12) Га) =he-*  forz0 

= 0 otherwise. 

The gamma probability law with parameters r and A, in which 

r—1,2,... and 4 — 0, is specified by the probability density function 
2 


4.13 — 7-10-22 
(4.13) fe) = ту 27e for «>0 
=0 otherwise. 


The exponential and gamma probability laws are discussed in Chapter 6. 
The Cauchy probability law with parameters о and В; in which —oo < 
% < co and fi > 0, is specified by the probability density function 


(4.14) ee n —00 « z < о. 
d C22) 
T 
| B 
Student’s distribution with parameter n = 1, 2, . . . (also called Student's 


t-distribution with n degrees of freedom) is specified by the probability 
density function 


ау 1 П + 1)2] a2 —(n41) /2 
(4.15) f(a) vs nu +) 


SEC. 4 PROBABILITY LAWS 181 


Tt ‘should be noted that Student's distribution with parameter n = 1 
ee with the Cauchy probability law with parameters œ = 0 and 
= 1. 
The 7? distribution with parameters n = 1, 2, . . . and ø > 0 is specified 
by the probability density function 


g/2)-19 7 01203) forz 0 


А 1 
(4.16) Ге) = этат) 


zz forz <0 


The symbol у is the Greek letter chi, and one sometimes writes chi-square 
for 72. The 7? distribution with parameters л апіс = l is called in statistics 
the y? distribution with n degrees of freedom. The 7? distribution with 
parameters п and o coincides with the gamma distribution with parameters 
r = n[2 and 2 = 1/(20°) [to define the gamma probability law for non- 
integer r, replace (r — 1)! in (4.13) by I'(7)]. 

The у distribution with parameters n = 1.2.53: 
by the probability density function 


е AnD 1 quoti? 
4.1 И gi (n/20%)z' for a 0 
(4.17) Ге) = TTA 2"—1е or x > 


=0 for z < 0. 


апа c > 0 is specified 


The y distribution with parameters л and е = 1 is often called the chi 
distribution with п degrees of freedom. (The relation between the 7? and 


X distributions is given in exercise 8.1 of Chapter 7). 
The Rayleigh distribution with parameter « > 0 is specified by the 


probability density function 


(4.18) f(x) = x ze eI — forz 0 
E 


=0 for = < 0. 


The Rayleigh distribution coincides with the у distribution with parameters 

n= 2 апіс = «V2. 
The Maxwell distri 

probability density function 


bution with parameter « > 0 is specified by the 


4 1 2442 
(4.19 Wiz ot quac uo forz 0 
| fe Vm aè 


for’ < 0 


182 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


The Maxwell distribution with parameter z coincides with the z distribu- 
tion with parameter n = 3 апіс = zV3[2. 

The F distribution with parameters m = 1,2,... and n = 1,2,... is 
specified by the probability density function 


T[(m + n)/2] — gl n/2)-1 
(4.20) f(x) = Tos] (i2) (т/п) ise forz>0 
=0 for x < 0. 


The beta probability law with parameters a and b, in which a and b are 
positive real numbers, is specified by the probability density function 


(4.21) уа) = 


жї-1(1 — gy 0 n l 
Han 0-9 ons 


=0 elsewhere. 


THEORETICAL EXERCISES 


4.1. The probability law of the number of white balls in a sample drawn without 
replacement from an urn of random composition. Consider an urn containing 
N balls. Suppose that the number of white balls in the urn is a numerical 
valued random phenomenon obeying (i) a binomial probability law with 
parameters N and p, (ii) a hypergeometric probability law with parameters 
M, N, and P. [For example, suppose that the balls in the urn constitute a 
sample of size N drawn with replacement (without replacement) from a 
box containing M balls, of which a proportion p is white.] Leta sample of 
size n be drawn without replacement from the urn. Show that the number 
of white balls in the sample obeys either a binomial probability law with 
parameters п and р, or a hypergeometric probability law with parameters 
M, n, and p, depending on whether the number of white balls in the urn 
obeys a binomial or a hypergeometric probability law. 


Hint: Establish the conditions under which the following statements are 
valid: 


(N) " (N = 5 (М); 


т m = К (т), ° 
е) oey 
(7) 


SEC. 4 PROBABILITY LAWS 183 


where 
pi) = (N) pgs, 


eoe) CGSN) 


р(т) 
| (x) A 
Finally, use the fact that 
Md [od (i) Fi =) 
H (М) 


EXERCISES 


4.1. Give formulas for, and identify, the probability law of each of the following 
numerical valued random phenomena: 
(i) The number of defectives in a sample of size 20, chosen without replace- 
ment from a batch of 200 articles, of which 5% are defective. 
(ii) The number of baby boys in a series of 30 independent births, assuming 
the probability at each birth that a boy will be born is 0.51. 
(iii) The minimum number of babies a woman must have in order to give 
birth to a boy (ignore multiple births, assume independence, and assume 
the probability at each birth that a boy will be born is 0.51). 
(iv) The number of patients in a group of 35 having a certain disease who 
will recover if the long-run frequency of recovery from this disease is 75% 
(assume that each patient has an independent chance to recover). 

In exercises 4.2-4.9 consider an urn containing 12 balls, numbered 1 to 

12. Further, the balls numbered 1 to 8 are white, and the remaining balls 
are red. Give a formula for the probability law of the numerical valued 


random phenomenon described. 


4.2. The number of white balls in a sample of size 6 drawn from the urn without 


replacement. 

4.3. The number of white balls in a 
replacement. 

4.4. The smallest number occurring 
from the urn without replacemen 

4.5. The second smallest number occurring in 
the urn without replacement. 

4.6. The minimum number of balls that must be drawn, when sampling without 
replacement, to obtain a white ball. 


4.7. The minimum number of balls that mus 
replacement, to obtain a white ball. 


sample of size 6 drawn from the urn with 
on the balls in a sample of size 6, drawn 


t (see theoretical exercise 5.1 of Chapter 2) 


a sample of size 6, drawn from 


t be drawn, when sampling with 


184 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


4.8. The minimum number of balls that must be drawn, when sampling without 
replacement, to obtain 2 white balls. 


4.9. The minimum number of balls that must be drawn, when sampling with 
replacement, to obtain 2 white balls. 


5. THE UNIFORM PROBABILITY LAW 


The notion of the uniform probability law (or uniform distribution) 
over the interval a to b, in which a and b are finite real numbers, is best 
defined in the following manner. Consider a numerical valued random 
phenomenon whose values can lie only in a certain finite interval S; that 
is S = (real numbers 2: a < t <b} for some finite numbers a and b. 
The random phenomenon is said to obey a uniform probability law over 
the finite interval S if the value P[B] of its probability function, at any 
interval В, satisfies the relation 
length of B 
length of S 

=0 if B and S have no points in common. 
It should be noted that knowing P[B] at intervals suffices to determine it on 
any Borel set B of real numbers. 

From (5.1) one sees that the notion of a uniform distribution represents an 
extension of the notion of a finite sample description space S with equally 
likely descriptions, since in this case the probability Р[А] of any event А on 
5 is given by the formula 

size of А 
2 us c1. 


(5.1) P[B] = if B is a subset of S 


There are many random phenomena for which it appears plausible to 
assume a uniform probability law. For example, suppose one is tossing a 
dart at a line marked 0 to 1. If one is always sure to land on the line and if 
one feels that any two intervals on the line of equal length have an equal 
chance of being hit, then one is led to conclude that the place at which the 
dart hits the line has a probability function satisfying (5.1), with S denoting 
the interval 0 to 1. 

The distribution function F(-) of a random phenomenon, which obeysa 
uniform probability law over the interval a to b is obtained from (5.1): 


(5.3) Fa) =0 ife<a 
r—a P 
= — ifa<x<b 
b—a E 


=1 ifa p, 


SEC. 5 THE UNIFORM PROBABILITY LAW 185 


By differentiation, the probability density function may be obtained: 
1 
(5.4) = ifa<2x<b 


=0 otherwise. 


From (5.4) it follows that the definition of a uniform probability law given . 
by (5.1) coincides with the definition given by (4.10). (See Fig. 5A.) 


4 f(x) (a) Af) (b) 
2 = 4 
I 
EN 
E 
1 | 
Lg 
1 1 
1} TM 1r 
e | 
I^ 
E 
os | i 05/- КО 7 
NN А 
ПИ! | | 
|. чулуу" mra аке ш 
x 
0 1 15 2 3*0 1 2 3 
A F(x) AFG) 
= Е 
o5[- 05|- 
b ris 2 3* 0 1 2 ag = 


ili i i d distribution function 
Fig. 5A. Probability density function апі 1 
uie uniform probability law over (a) the interval 1 to 1.5, 


(6) the interval 1 to 3. 


* Example 5A. Waiting time for а train. Between 7 A.M. and 8 a 
trains leave a certain station at 3, 5, 8, 10, 13, 15, 18, 20, . . . minutes iE 
the hour. What is the probability that a person arriving at the station WI 


186 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


have to wait less than a minute for a train, assuming that the person's time 
of arrival at the station obeys a uniform probability law over the interval 
of time (i) 7 A.M. to 8 A.M., (ii) 7:15 A.M. to 7:30 A.M., (iii) 7:02 A.M. to 
7:15 A.M., (iv) 7:03 A.M. to 7:15 A.M., (v) 7:04 A.M. to 7:15 A.M.? 
Solution: We must first find the set В of real numbers in which the 
person's arrival must lie in order for his waiting time to be less than 1 
minute. One sees that is the set of real numbers consisting of the intervals 
2 to 3, 4 to 5, 7 to 8, 9 to 10, and so on. (See Fig. 5B.) The probability 
that the person will wait less than a minute for a train is given by Р[В], 
which is equal to, in the various cases, (i) 24 = 2, (ii) 5 = 2, (iii) $, 


5 


(iv) 3, (V) зт. 4 


Fig. 5B. In order that the person discussed in example 5A will 
have to wait less than one minute for a train, his time of arrival 
X at the station (measured in minutes after 7 A.M.) must lie in 
the set В, consisting of the shaded intervals. 


> Example 5B. The probability law of the second digit in the decimal 
expansion of a randomly chosen number. A number is chosen from the 
interval 0 to 1 by a random mechanism that obeys a uniform probability 
law over the interval. What is the probability that the second decimal 
place of the square root of the number will be the digit 3? Is the digit k 
fork —0,1,...,92 

Solution: For k — 0, 1,...,9 let B, be the set of numbers on the unit 
interval whose square roots have a second decimal equal to the digit k. 


A number = belongs to B, if and only if Vz satisfies for some m = 0, 
98, 


k kl 
mig S l0Vs «m —— 
or 
1 ky 1 k+1\2 
5.5 А кү ES ead 
ea io" + 5) е an 10 |. 


The length of the interval described by (5.5) is 


(m+ t1)? al +) TEN NT. 
1 == = 
i00 10 100\ ^ 10/ = 19,999 20" + 2k + 1). 


SEC. 


5 THE UNIFORM PROBABILITY LAW 187 


Hence the probability of the set В, is given by 


In particular, P[B4] = 0.097. 


5.1. 


52; 


5.3. 


5.4 


5.5. 


5.6. 


5.7. 


1 9 
PIB] = —— 2 2k = c 
[81 10.000 PA От + 2k + 1) = 0.091 + 0.002. 


EXERCISES 


The time, measured in minutes, required by a certain man to travel from 
his home to a train station is a random phenomenon obeying a uniform 
probability law over the interval 20 to 25. If he leaves his home promptly 
at 7:05 A.M., what is the probability that he will catch a train that leaves the 


station promptly at 7:28 A.M.? 


A radio station broadcasts the ci 


the hours of 6 A.M. and 12 midnight. Wh 
will have to wait less than 10 minutes to hear the correct time if the time at 


which he tunes in is distributed uniformly over (chosen randomly from) the 
interval (i) 6 A.M. to 12 midnight, (ii) 8 A.M. to 6 р.м., (iii) 7:30 A.M. to 
5:30 р.м., (iv) 7:30 A.M. to 5 P.M? 
The circumference of a wheel is divided into 37 arcs of equal length, which 
are numbered 0 to 36 (this is the principle of construction of a roulette 
wheel). The wheel is twirled. After the wheel comes to rest, the point on 
the wheel located opposite a certain fixed marker is noted. Assume that 
the point thus chosen obeys à uniform probability law over the circum- 


ference of the wheel. What is the probability that the point thus chosen will 
lie in an arc (i) with a number 1 to 10, inclusive, (ii) with an odd number, 


(iii) numbered 0? 

A parachutist lands on the line connecting 2 towns, 4 and B. Suppose 
that the point at which he lands obeys а uniform probability law over the 
line. What is the probability that the ratio of his distance from A to his 
distance from B will be (i) greater than 3, (ii) equal to 3, (iii) greater than R, 
where R is a given real number? 


An angle 0 is chosen from the interval 
that obeys a uniform probability law ove ; 
on an y)-plane ironed the point (0, 1) at the angle 0 with the y-axis. 
What is the probability, for any real number 2, that the x-coordinate of the 
point at which the line intersects the 2-аХ15 will be less than z? 

i i ism that 
A number is chosen from the interval 0 to 1 by a random mechanism th 
obeys a uniform probability law over the interval. What is the probability 
that (i) its first decimal will be a 3, (ii) its second decimal will be a 3, (iii) its 
first 2 decimals will be 3's, (iv) any specified decimal will be a 3, (v) any 
2 specified decimals will be 3’s? 


A number is chosen from the in 


correct time every hour on the hour between 
at is the probability that a listener 


—a[2 to т]2 by a random mechanism 
r the interval. A line is then drawn 


terval 0 to 1 by a random mechanism that 


i ili i 1. What is the probability 
obeys a uniform probability law over the interva E pro i 
that () the first demai of its square root will be a 3, (ii) the negative of its 


logarithm (to the base e) will be less than 3? 


188 NUMERICAL-VALUED RANDOM PHENOMENA cH. 4 
6. THE NORMAL DISTRIBUTION AND DENSITY FUNCTIONS 


A fundamental role in probability theory is played by the functions 4(-) 
and ®(-), defined as follows: for any real number x 


1 


6.1 de) = Lene 
T 
Ы 1 * —léy? 
(62) 90) = [pao T ema 
Ф(х) 


(6.3) 
(6.4) 
A table of Ф(а) for positive values of x is 


$(—2) = (x) 
®(—z) = 1 — D(z). 
given in Table I (see p. 441). 


SEC. 6 THE NORMAL DISTRIBUTION AND DENSITY FUNCTIONS 189 


The function (x) is positive for all х. Further, from (2.24) 
(6.5) ! d(x) dz = 1, 


so that Ф(-) is a probability density function. 


The importance of the function (-) arises from the fact that probabilities 
concerning random phenomena obeying a normal probability law with 
parameters m and o are easily computed, since they may be expressed in 
terms of the tabulated function Ф(:). More precisely, consider a random 


-4 -3 -2 =] 


mal distribution function D(z), 


Fig. 6B. Graph of the nori 
ility law is specified by the probability density 


ose probab ; RN d ME: 
dep qr AR ) The corresponding distribution function is 


function f(-), given by (4.11 


given by ' 
1 Г ut") a 
= e y 
(6.6) HTC s 
Em 1 е dy ET e(* - ") 
Мт Je 


Consequently, if X is ап observed value of a numerical valued ж іц 
petites obeying a normal probability law with parameters m and о, 


190 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


then for any real numbers a and b ( finite or infinite, in which a — b), 


b—m a—m 
(6» Pla < X< b= FO) — Fa) = of —- e(t 
> Example 6А. “‘Grading on the curve.” The properties of the normal 
distribution function provide the basis for the system of “grading on the 
curve” used in assigning final grades in large courses in American 
universities. Under this system, the letters A, B, C, D are used as passing 
grades. Of the students with passing grades, 15% receive A, 35% receive 
В, 35% receive C, and 15% receive D. The system is based on the assump- 
tion that the score X, which each student obtains on the examinations in 
the course, is an observed value of a numerical valued random phenomenon 
obeying a normal probability law with parameters m.and ø (which the 
instructor can estimate, given the scores of many students). From (6.7) it 

, follows that 
POS X—-m<o]=Pl-o< х т< 0] 


(6) = Ф(1) — Ф(0) = 0.3413 
Plo X X — m < 25] = Р[-20 < X – т< —о] 
= Ф(2) — Ф(1) = 0.1359. 


Therefore, if опе assigns the letter A to a student whose score X is between 
m + c and m + 20, one would expect 0.1359 (approximately 15%) of the 
Students to receive a grade of 4. Similarly, 0.3413 (approximately 35%) 


of the students receive a grade of B if B is assigned to a student with a 


Score X between m and m + 9; approximately 35% receive C if C is 
assigned to a student with a Sco 


re between m — o and m; and approxi- 
mately 15% receive D if D is assigned to a student with a score between 


m — 2o and m — о. < 
The following exam 


ple illustrates the use of (6.7) in solving problems 
involving random phe 


nomena obeying normal probability laws. 
> Example 6B. Consider a 


probability law with parameters m = 2 and o = 
an observed value Y of the ran 
0 and 3 is given by 


з (22у? = = 
РЮ< Х<3] = = fie 2 ae = 0(352) -o(2 2) 
0 


2V 27 


= o(3) — Ф(—1) = o(3) + Ф(1) — 1 = 0.533; 


SEC. 6 THE NORMAL DISTRIBUTION AND DENSITY FUNCTIONS 191 


the probability that an observed value X of the random phenomenon will 
have a value between —1 and 1 is given by 


rixs п = o(L52) - (772) - o(- 2) -o(-3) 


3 1 
= |->] — |-| = 
o (5) Ф () 0.242. 


The conditional probability that an observed value X of the random 
phenomenon will have a value between —1 and 1, given that it has a value 
between 0 and 3, is given by 


PO<X< 
щ—1<х<1|о< х<з}= 505755 


DK- 2021 — PO — 2/2 _ 0.150 _ 
i 0.533 E ml 


The most widely available tables of the normal distribution are the 
Tables of the Normal Probability Functions, National Bureau of Standards, 


Applied Mathematics Series 23, Washington, 1953, which tabulate 


1 m ad = rur ү -1° 
O(x) = vu e , P= m E. v dy 


to 15 decimals for ж = 0.0000 (0.0001) 1.0000 (0.001) 7.800. 


THEORETICAL EXERCISES 


6.1. One of the properties of the normal density functions which make them 
convenient to work with mathematically is the following identity. Verify 
algebraically that for any real numbers *, Mı, Mə, бү, and о» (among which 


c, and сз are positive) 


æ — т, ? 2 o— My + 
in [3622] [3629] 
2 2 
z—m Gn, — т)? 
-e[36z7) Jee [3252]. 


where & 
25,2 
moo? + тыт? ea 08 


(6.10) та re ° о? og 


6.2. Although it is not possible to obtain an explicit formula for the normal 
distribution function (>), in terms of more familiar functions, it is possible 


192 


6.3. 


6.1. 


62. 


6.3. 


6.4. 


6.5. 


NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


to obtain various inequalities on P(x), Show the following inequality, which 
is particularly useful for large values of x: for anys > 0 


(610 КЕС - о dy < —— ec 


2-7 


© 
Hint: Use the fact ња уе dy = е)? 


2 


Prove (6.3) and (6.4). Hint: Verify that 


-z 


$(y) dy =| Ф00) dy. 


EXERCISES 


Let X be the observed value of a numerical valued random phenomenon 
obeying a normal probability law with parameters (i) т = 0 апіс = 1, 
Gi) m 20 and o =2, For&in0 cs —1 define J(x) and K(x) so that 


РІХ >] =a, РХ| < Kla) = a. 
Find J(«) and K(«) for « = 0.05, 0.10, 0.50, 0.90, 0.95, 0.99. 


Suppose that the life in hours of a electronic tube manufactured by a 
certain process is normally distributed with parameters m = 160 hours 
and c. What is the maximum allowable value for c, if the life X of a tube 
is to have probability 0.80 of being between 120 and 200 hours? 


Assume that the height in centimeters of a man aged 21 is a random 
phenomenon obeying a normal probability law with parameters m = 170 
and c = 5. What is the conditional probability that the height of a man 


aged 21 will be greater than 170 centimeters, given that it is greater than 
160 centimeters ? 


A shirt manufacturer determines by observation that the circumference 
of the neck of a college man is a random phenomenon approximately 
obeying a normal probability law with parameters т = 14.25 inches and 
9 — 0.50 inches. For the purpose of determining how many shirts of a 
manufacturer's total production should have various collar sizes, compute 
for each of the sizes (measured in inches), 14, 14.25, 14.50, 14.75, 15.00, 
15.25, 15.50, 15.75, and 16.00, the probability that a college man will wear 
a shirt collar of the given size, assuming that his collar size is the smallest 
size more than $ of an inch larger than the circumference of his neck. 


A machine produces bolts in a length (in inches) found to obey a normal 
probability law with parameters m = 10 and o = 0.10. The specifications 
for the bolt call for items with a length (in inches) equal to 10.05 + 0.12. 
A bolt not meeting these specifications is called defective. 


(i) What is the probability that a bolt produced by this machine will be 
defective? 


SEC. 7 NUMERICAL Л-ТОРІЕ VALUED RANDOM PHENOMENA 193 


(ii) If the machine were adjusted so that the length of bolts produced by it 
is normally distributed with parameters m = 10.10 and с = 0.10, what is 
the probability that a bolt produced by the machine will be defective? 


(iii) If the machine is adjusted so that the lengths of bolts produced by it are 
normally distributed with parameters mı = 10.05 and с = 0.06, what is the 
probability a bolt produced by the machine will be defective? 


6.6. Let 
g(x) = loup | = a G(x) = Я g(x’) de’ 
ow 2V Ir р 2 2 , E a T. 


Tabulate g(x) and G(x) for = 0, +1, +2, +3. Compare these functions 
with 4(x) and Ф(2), by plotting (x) and g(x) on one graph and Ф(а) and 
G(x) on a second graph. 


6.7. Tabulate 


H(x) = a 4 ехр Е = 2 dy for z = 0, 1, 2, 3. 
2V2s Jaz "M 2 


Give a probabilistic meaning to H(z). 


7. NUMERICAL n-TUPLE VALUED RANDOM PHENOMENA 


esult of a random experiment is not expressed by a 


single quantity but by a family of simultaneously observed quantities. 
Thus, to describe the outcome of the tossing of a pair of distinguishable 
dice, one requires a 2-tuple (21, 25), in which 2, denotes the number obtained 
on the first die and x, denotes the number obtained on the second die. 
Similarly, to describe the geographical location of an object (such as a 
ship), one requires a 2-tuple (tis 25), whose components represent the 
latitude and longitude of the ship, respectively. One may want to describe 
the prices of some commodity (such as wheat or International Business 
Machines’ common stock) on the first day of each month of a given year; 


3 ; 2 РРС: hose components 
to do uires а 12-tuple (tj. » - - ‚ Tp) М 
gee: on the first day of January, February, 


ж, čo, .. . , зуу represent the price | 
March A ben and December, respectively. On the other hand, 
re =? , : t à 

the price of each of z com- 
for i one may want to describe pr 
deni nio 1 electricity, etc.) on July 1 ofa 


modities on a list (bread, milk, meats, shoes, | х 2, вове 
Biven year; to do this, one requires an n-tuple (ty, 29. . -> Ën) 


t the price on July 1 of the first 
Components £, to -+ +> n Tepresen | 1 
Е on ihe list, the second commodity on the list, and so on, up 


to ity on the list. 
а ra сере the notion of а numerical n-tuple valued random 
S are ER andom phenomenon whose sample 


phenomenon, which we define as а T? Dm 
description space is the set R, consisting of all n-tuples (Жу, 2s. ++ +> vy) 


In many cases the r 


194 NUMERICAL-VALUED RANDOM PHENOMENA сн. 4 


which the components 2, #,..., 2 x, are real numbers from — оо to co. 
In this section we indicate the notation that is used to discuss numerical 
n-tuple valued random phenomena. We begin by considering the case of 
п = 2. 

А numerical 2-tuple valued random phenomenon is described by stating 
its probability function P[:], whose value P[B] at any set B of 2-tuples of 
real numbers represents the probability that an observed occurrence of 
the random phenomenon will have a description lying in the set B. It is 
useful to think of the probability function Р[:] as representing a distribution 
of a unit mass of some substance, which we call probability, over a 
2-dimensional plane on which rectangular coordinates have been marked 
off, as in Fig. 7A. For any (probabilizable) set B of 2-tuples P[B] states 
the weight of the probability substance distributed over the set B. 

In order to know for all (probabilizable) sets B of 2-tuples the value 
P[B] of the probability function P[-] of a numerical 2-tuple valued random 
phenomenon, it suffices to know it for all real numbers x, and z, for the sets 


(7.1) В, s, = {2-tuples (ж, zy): з < fj By! «ou. 


In words, B, +, is the set consisting of all 2-tuples (2,', x) whose first 
component 2,’ is less than the specified real number x, and whose second 
component 2:, is less than the specified real number a. We are thus led to 
introduce the distribution function F(. . .) of the numerical 2-tuple valued 


random phenomenon, which is a function of two variables, defined for all 
real numbers з; and x, by the equation 


(7.2) Қа, 25) = PIB; ,]. 


The quantity F(a, x) represents the probability that an observed occurrence 
of the random phenomenon under consideration will have as its description 
a 2-tuple whose first component is less than or equal to 2, and whose 
second component is less than or equal to z,. In terms of the unit mass of 
probability distributed over the plane of Fig. 7A, F(a,, x) is equal to the 
weight of the probability substance lying over the "infinitely extended 
rectangle," which consists of all 2-tuples (z,', 2^), such that a,/ <a, and 
Xy < my which corresponds to the shaded area in Fig. 7A. 

The probability assigned to any rectangle in the plane may also be 
expressed in terms of the distribution function F(- ,.): for any real numbers 
a and a, and any positive numbers h, and hy 


(7.3) Pl((z,', $3): a <a’ <а + hy, dg < Wa xs а» + hg}] 
= Ка, + h, ay + hy) + F(a, а) — Flay + hy, ay) — F(ay, ay + hy). 


As in the case of numerical valued random phenomena, the most important 


SEC. 7 NUMERICAL Л-ТОРІЕ VALUED RANDOM PHENOMENA 195 


cases of numerical 2-tuple valued random phenomena are those in which 
the probability function is specified either by a probability mass function or 
a probability density function. 

Given a numerical 2-tuple valued random phenomenon, we define its 
probability mass function, denoted by JC. .), as a function of two variables. 
defined for all real numbers x, and =, by the equation | 


(7.4) р(х, 23) = PHE, 22): ay! = тү, = am. 
х2. 
ag*hgb----------------7 
agL---------------7, E 


m 


les (z,', 22) of real numbers, represented as à 


Fig. 7A. The set R of all 2-tup 
tangular coordinate system has been 


2-dimensional plane on which a rec 
imposed. 


The quantity p(t, 2) represents the probability that an observed occur- 
тепсе of the random phenomenon under consideration will have as its 
description a 2-tuple whose first component is equal to хт, and whose 
Second component is equal to т. It may be shown that there is only a 
finite or countably infinite number of points at which р(х, 2») > 0. 

We define a numerical 2-tuple valued random phenomenon as obeying а 
discrete probability law if the sum of its probability mass function, over 
the points (21, 2з) Where pn, 2) > 0, is equal to 1. Equivalently, the 
Tandom phenomenon obeys a discrete probability law if its probability 
function P[] is specified by its probability mass function p(.,.) by the 


formula for any set В of 2-tuples: 


з Be Snn OD 
over (21.72) lying in B 
such that. pra) 70 


mass distributed over the plane of 


In terms of the unit of probability 
Р[],а numerical 2-tuple valued random 


Fig. 74 by the probability function 


196 NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


phenomenon obcys a discrete probability law if in order to distribute the 
corresponding unit probability mass one needs only attach a positive 
probability mass at cach of a finite or countably infinite number of points. 

We next consider numerical 2-tuple valued random phenomena whose 
probability functions P[{-] may be specified in terms of a function f(. , .) of 
two variables, which we call its probability density function. For every 
(probabilizable) set B of 2-tuples 


(76) PIB) = | | re =) dey dus 
B 


Equivalently, its distribution function F(.,.) satisfies, for every pair of 
real numbers 2, and z, 


22 Ta 
(1.7) F(2;, 25) -Í . | f Gi, m) dz, du. 


Consequently, the probability density function may be obtained from the 
distribution function F(. , .) by differentiation: 


2 


(7.8) f, =) = ue F(n, ж) 


1 Ov, 


at all 2-tuples (жү, 2), where the second-order mixed partial derivative 
д°](дх, Ox.) Ех, хь) exists. 

In the case of numerical 2-tuple valued random phenomena it remains 
true, from a practical point of view, that the only random phenomena whose 
distribution functions F(., -) are continuous, regarded as a function of 
two variables, are those whose distribution functions are specified by a 
probability density function. Consequently, we shall say that a numerical 
2-tuple valued random phenomenon obeys a continuous probability law if 
its probability function and distribution function are specified by a proba- 
bility density function. 

All the notions of this section extend immediately to numerical n-tuple 
valued random phenomena by reading n-tuple for 2-tuple and (23, 25, . . -> vy) 


for (1, ж) in the foregoing discussion. In place of (7.1) to (7.8) read the 
following equations: 


ED) Jas 


=— , H 
= (n-tuples (5, --«, m): ау < Riles ЖЕ een, ar ТЕ 2.) 


(7.2) Ед, 2, 5, ,) = PIB, 


сеа, 


SEC. 7 NUMERICAL Л-ТОРІЕ VALUED RANDOM PHENOMENA 197 


(7.3) PHE’, mns m2) а <a <а + hy, 
dy Xy <а + hy "а, < Eq < Ap + tal 


= F(a, + hy, da + ha, ** 2, + hy) 


= Flap ару gaa TI) 
— Flay + №, Inna + а Gn) 


+ (=) "Еа, ds ^*^: а). 


(7.4) p(t, mss z,) = PHE’ 2*5, Bn)? 


S V. ers г m tt == т 
жү = X; Y. = T» 2 =). 


(7.5) P[B] = x рб, ®»*°°› 2,). 
over (rtg +++ z,) lying in n 
such that prj, Te -++ 2,020 


c^ оюй=| [feet i ds 
B 
OT) F(mn mec Xn) 


-{" iy i [ fe 2. z,) de, dz, *** de 
wt aes d д" 


Д T еа [tas si * 5 hae 
(L8) — f(m Bay 7 au Dig + + Oan Gi ta Tn) 


There are many other notions that arise in connection with numerical 
n-tuple valued random phenomena, but they are best formulated in terms 
of random variables and consequently are discussed in Chapter 7. 


EXERCISES 


7.1. Let, for some finite constants 4 b, c, and K, 

f Gn, %2) = Ке 
ility density function of 
i for the probability 3i 

xd n a et der for phenomenon it is necessary and sufficient that the 

кышып a, b, c, and K satisfy the conditions a > 0, c > 0, b? —ac <9, . 

"x 


K = (Iz) Vac — b°. 


2 49bzar, r3 


жу, Ty) tO be 


198 


7.2. 


7.3. 


NUMERICAL-VALUED RANDOM PHENOMENA CH. 4 


An urn contains M balls, numbered 1 to M. Two balls are drawn, | after 
the other, with replacement (without replacement). Consider the 2-tuple 
valued random phenomenon (-r;, =), in which 2; is the number on the 
first ball drawn, and 2, is the number of the second ball drawn. Find the 


probability mass function of this 2-tuple valued random phenomenon and 
show that its probability law is discrete. 


Consider a square sheet of tin, 20 inches wide, that contains 10 rows and 
10 columns of circular holes, each 1 inch in diameter, with centers evenly 
spaced at a distance 2 inches apart. 

(i) What is the probability that a particle of sand (considered as a point) 
blown against the tin sheet will fall upon 1 of the holes and thus pass 
through? 

(ii) What is the probability that a ball of diameter 3 inch thrown upon the 
sheet will pass through 1 of the holes without touching the tin sheet? 
Assume an appropriate uniform probability law. 


CHAPTER 5 


Mean and Variance 
of a Probability Law 


It has been emphasized that in order to describe a numerical valued 
random phenomenon one must specify its probability function P[] or, 
equivalently, its distribution function F(:). In the special case in which the 
random phenomenon obeys à discrete or a continuous probability law its 
probability function is determined by a knowledge of the probability mass 
function p(:) or of the probability density function f(-). Thus, to describe 
a numerical valued random phenomenon, certain functions must be 
specified. It is desirable to be able to summarize some of the outstanding 
features of the probability law of a numerical valued random phenomenon 
by specifying only a few numbers rather than an entire function. Such 
numbers are provided by the expectation of various functions g(-) with 
respect to the probability law of the random phenomenon. 


1. THE NOTION OF AN AVERAGE 


In order to motivate our definition of the notion of expectation, let us 
first discuss the meaning of the word "average." Givena set of quantities, 
which we denote by 21, 2, +++ > Yw we define their average, often denoted 


by à, as the sum of the quantities divided by n; in symbols 
ж +++ ls 
э MI p M LIES 21. 
(1.1) = E P 


199 


200 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 


The quantity 2 is also called the arithmetic mean of the numbers 25, %,... , 
2 . . 
For example, consider the scores on an examination of a class of 20 


students: 
(1.2) (10, 10, 10, 10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 7, 7, 6, 5, 5, 5} 


The average of these scores is 160/20 — 8. 

Very often, a set of п numbers, у, 25, . . . , x, Which is to be averaged, 
may be described in the following way. There are k real numbers, which 
we may denote by z’, £a’, ..., z,', and k integers, 71, 71, . . . , п, (whose 
sum is n), such that the set of numbers {z}, 25, ..., 2,) consists of m 
repetitions of the number 2’, n; repetitions of the number xy’, and so on, 
up to n, repetitions of the number z,'. Thus the set of scores in (1.2) may 
be described by the following table: 


Possible values z;' in the set | 1098765 
(1.3) 


Number n; of occurrences of x; inthe set | 45 5 2 1 3 


In terms of this notation, the average defined by (1.1) may be written 
(1.4) gl Drin, 
We may go опе step further. Let us define the quantity 


1а = 
(1.5) fei) =" 
that represents the fraction of the set of numbers {z}, to, . . . , x, J, which is 
equal to the number z;'. Then (1.4) becomes 
k 


(1.6) n= È eif). 


In words, we may read (1.6) as follows: the average € of a set of numbers, 
Eis 23... Xp, is equal to the sum, over the set of numbers, z,', tg, . . . , y, 
which occur in the set (x,, 25, . .. , к), of the product of the value of x; 
and the fraction f (z/); f (x,) is the fraction of numbers in the set (24, £o, . . . ọ 
which are equal to x. 

The question naturally arises as to the meaning to be assigned to the 
average of a set of numbers. It seems clear that the average of a set of 
numbers is computed for the purpose of summarizing the data represented 


SEC. 1 THE NOTION OF AN AVERAGE 201 


by the set of numbers, so as to better comprehend it. Given the examina- 
tion scores of a large number of students, it is difficult to form an opinion 
as to how well the students performed, except perhaps by forming averages. 

However, it is also clear that the average of a set of numbers, as defined 
by (1.1) or (1.6), does not serve to summarize the data completely. Consider 
a second group of twenty students who, in the same examination on which 
the scores in (1.2) were obtained, gave the following performance: 


Scores v; |o 98765 


(1.7) 


Number n; of students scoring the score =, | 3.5.6 2. 321 


The average of this set of scores is 8, as it would have been if the scores 
had been 


Scores x; [i0 98765 
(1.8) 
Number л; of students scoring the score z;' [3 383 30 
Consequently, if we are to summarize these collections of data, we shall 


require more than the average, in the sense of (1.6), to do it. 

The average, in the sense of (1.6), is a measure of what might be called 
the mid-point, or mean, of the data, about which the numbers in the data 
are, loosely speaking, “centered.” More precisely, the cen d represents 
the center of gravity of a long rod on which masses f (a, Эа (n) have 
been placed at the points 2’, IL respectively. { 

Perhaps another characteristic of the data for which one should have a 
measure is its spread or dispersion about the mean. Of course, it is not clear 


how this measure should be defi 


The dispersion might be define 
the deviation of each number in t 


ned. 
d as the average of the absolute value of 


he set from the mean 2; in symbols, 


Д 
(1.9) absolute dispersion = 2 le = &| f(a). 
= 


ession (1.9) for the data in (1.3), (1.7), and (1.8) is 


equal to 1.3, 1.1 and 0.9, respectively, where in each case the mean 2 = 8. 
Another possible measure of the spread of the data is the average of the 
squares of the deviation from the mean ғ of each number =, in the set; 


in symbols, 
(1.10) 


The value of the expr 


k 
square dispersion = 2G — 8) f (x,), 


202 MEAN AND VARIANCE OF A PROBABILITY LAW cH. 5 


which has the values 2.7, 2.0, and 1.5 for the data in (1.3), (1.7), and (1.8), 
respectively. 

Next, one may desire a measure for the symmetry of the distribution of 
the scores about their mean, for which purpose one might take the average 
of the cubes of the deviation of each number in the set from the mid-point 
& (= 8); in symbols, 


" 
(1.11) P = OY), 


which has the values —2.7, — 1.2, and 0 for the data in (1.3), (1.7), and 
(1.8), respectively. 

From the foregoing discussion one conclusion emerges clearly. Given 
data {x,,2,...,2,}, there are many kinds of averages one can define, 
depending on the particular aspect of the data in which one is interested. 
Consequently, we cannot speak of the average of a set of numbers. 
Rather, we must consider some function g(x) of a real variable x; for 
example, g(x) = x, g(x) = (x — 8}, or g(x) = (x — 8). We then define 


the average of the function g(a) with respect to a set of numbers (uy oes 2 
as 
12 k 
(1.12) У gle) = У ge S), 
пј=1 i-i 
in which the numbers z,', . . . , х; occur in the proportions f (zy), . . +» fé) 


7 1 
in the set (z,, 25, .. . , Zp}. 


EXERCISES 


In each of the following exercises find the average with respect to the data 
given for these functions: (i) g(c) ==; (ii) g(x) = (ж — ®)?, in which z is the 
answer obtained to question (i); (iii) gx) = (c — z)?; (iv) gv) = (v — #8); 
(у) g(x) = |= — z|. Hint: First compute the number of times each number 
appears in the data. 


1.1. The number of rainy days іп a certain town during the month of January 
for the years 1950-1959 was as follows: 


Year | 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 


Number of rainy 


days in January 9 21 16 16 9 13 9 8 21 


1.2. Record the last digits of the last 20 telephone numbers appearing on the 
first page of your local telephone directory. 


SEC. 2 EXPECTATION OF A FUNCTION 203 


1.3. Ten light bulbs were subjected to a forced life test. Their lifetimes were 
found to be (to the nearest 10 hours) 
850, 1090, 1150, 940, 1150, 960, 1040, 920, 1040, 960. 


2 balls without replacement from an 
ording the sum of the 2 
recorded 


1.4. An experiment consists of drawing 
urn containing 6 balls, numbered | to 6, and rec 
numbers drawn. In 30 repetitions of the experiment the sums 


were (compare example 4A of Chapter 2) 
"7g 5 8 STA 93 59 п 9 49 


1. 7 104185161095 7 9 10 10 3. 


2. EXPECTATION ОЕ А FUNCTION WITH RESPECT 
ТО A PROBABILITY LAW 


Consider a numerical valued random phenomenon, with probability 
rmines a distribution of a 


function P[.]. The probability function P[ dete 
unit mass on the real line, the amount of which lying on any (Borel) set B of 
real numbers is equal to P[B]. In order to summarize the characteristics 
of P[] by a few numbers, we define in this section the notion of the 
expectation of a continuous function g(x) of a real variable x, with respect to 
the probability function P[-], to be denoted by E[g(x). It will be seen that the 
expectation E[g(x)] has much the same properties as the average of g(2), 
With respect to a set of numbers. 
For the case in which the pro 
Probability mass function pC). We define, 


a Elgo] = 


bability function P[] is specified by a 
in analogy with (1.12), 


g(x)p(2)- 
over all z such 
that p(z) >0 
The sum written in (2.1) may involve the summation of a countably 
infinite number of terms and therefore is not always meaningful. For 
reasons made clear in section 1 of Chapter 8 the expectation E[g(x)] is said 
to exist if 


Q.2) Ellg@ = OIE) < 2: 


over all z such 
that p(z) >0 


In words, the expectation Elg), defined in (2.1), exists if and only if the 
] is absolutely convergent. A test for conver- 


infinite series defining E[g( ; ise 2 
gence of an infinite series is given in theoretical exercise 2.1. | 

For the case in which the probability function P[] is specified by à 
Probability density function fo, we define 


ep E[g(z)] = js gef Ge) dr. 


204 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 


The integral written in (2.3) is an improper integral and therefore is not 
always meaningful. Before one can speak of the expectation E[g(x)], one 
must verify its existence. The expectation E[g(x)] is said to exist if 


QA) Eligo) = | IBISO de < o. 


In words, the expectation E[g(«)] defined in (2.3) exists if and only if the 
improper integral defining E[g(«)] is absolutely convergent. ln the case in 
which the functions g(-) and /(:) are continuous for all (but a finite number 
of values of) x, the integral in (2.3) may be defined as an improper 
Riemann* integral by the limit 


© b 
(2.5) | g(x) f(x) йж = lim [sere dx. 


a--— о 
b— o 


A useful tool for determining whether or not the expectation £[g()], 
given by (2.3), exists is the test for convergence of an improper integral 
given in theoretical exercise 2.1. 

A discussion of the definition of the expectation in the case in which the 
probability function must be specified by the distribution function is given 
in section 6. 

The expectation E[g(x)] is sometimes called the ensemble average of the 
function g(a) in order to emphasize that the expectation (or ensemble 
average) is a theoretically computed quantity. It is not an average of an 
observed set of numbers, as was the case in section 1. We shall later 
consider averages with respect to observed values of random phenomena, 
and these will be called sample averages. 

A special terminology is introduced to describe the expectation E[g(x)] 
of various functions g(x). 

We call Efe], the expectation of the function g(x) = x with respect to a 


probability law, the mean of the probability law. For a discrete probability 
law, with probability mass function p(-), 


(2.6) E(x] = _ > WE. 
oer) 


For a continuous probability law, with probability density function f), 


(2.7) E[x] "E f(x) dx. 


* For the benefit of the reader acquainted with the theory of Lebesgue integration, 
let it be remarked that if the integral in (2.3) is defined as an integral in the sense of 
Lebesgue then the notion of expectation E[g(x)] may be defined for a Borel function 
ge). 


SEC. 
2 EXPECTATION OF А FUNCTION 205 
It m 
ede ay be shown that the mean of a probability law has the following 
Де КЕ Suppose one makes a sequence pP CEU зу of indepen- 
e - servations of a random phenomenon obeying the probability law 
orms the successive arithmetic means 


1 


д= х A= (t X, 


1 
A, = – 1 
3 3065 + X Xs Ay =o (hy + Xa EUER 


Ee successive arithmetic means, Ay, As + - -» Any Will (with probability 
а tend to a limiting value if and only if the mean of the probability law 
nite. Further, this limiting value will be precisely the mean of the 
probability law. 
We call Е[а2], the expectation o 
een law, the mean square of 
e confused with the square mean 
Square (E[]? of the mean and which 
Probability law, with probability mass 
(2.8) E= X a? p(x). 


over all z such 
that p(z) 20 


f the function g(x) = 2° with respect to а 
the probability law. This notion is not 
of the probability law, which is the 
we denote by Еа]. Fora discrete 


function p(). 


For a continuous probability law, with probability density function FO; 


(2.9) EDS = | ® ауда) dë. 


га 20805-9 we call E[z"], the 
robability law, the nth moment 

moment and the mean of a 
ond moment and the mean 


More generally, for any intege 
expectation of g(x) = x” with respect to a P 
of the probability law. Note that the first 
Probability law are the same; also, the sec 
Square of a probability law are the same. 

Next, for any real number © and integer п = 1,2,3,..., we call 
El(x — c)"] the nth moment of the probability Jaw about the point c. of 


especial interest is the case in which c is equal to the mean Efe]. We call 
ty law about its mean ог, 


Е[(ж — E[a])"] the nth moment of the probabilit) la 
more briefly, the nth central moment of the probability law. 
The second central moment £[(* — Е[])711$ especially important and is 
called the variance of the probability law. Given а probability law, we shall 
use the symbols m and 0° to denot its mean and variance; 


consequently, 


(2.10) m = Е[®), 


е, respectively, 


вз = Elle — тЇ. 


206 MEAN AND VARIANCE OF A PROBABILITY LAW cH. 5 


The square root c of the variance is called the standard deviation of the 
probability law. The intuitive meaning of the variance is discussed in 
section 4. 


p> Example 2A. The normal probability law with parameters m and o is 
specified by the probability density function /(-), given by (4.11) of Chapter 
4. Its mean is equal to 


-u( 


с 


i de y j [ à 
2.1) Bfe ees | xe dese | m + oyje™™® dy, 
Gi Bd x] uu X y) y 


where we have made the change of variable y = (x — m)/o. Now 
© © о 
(2.12) [| е7 dy = | pe" dy = Ут» I ye™ dy = 0. 


Equation (2.12) follows from (2.20) and (2.22) of Chapter 4 and the fact 
that for any integrable function A(y) 


[9-0 iti = 19) 
(2.13) 9 
| h(y) dy = af h(y) dy if h(—y) = (y). 
-o 0 
From (2.12) and (2.13) it follows that the mean E[z] is equal to т. Next, 
the variance is equal to 


z—m 


OID Ele- p= fe — me ae 


1 | à А 

= о? 20—740 = о? 

= о%®—— ye” dy = о. 
Мт 4- = 

Notice that the parameters m апа с in the normal probability law were 

chosen equal to the mean and standard deviation of the probability law. 4 


The operation of taking expectations has certain basic properties with 
which one may perform various formal manipulations. To begin with, 
we have the following properties for any constant c and any functions 
g(x), gy (2), and g(x) whose expectations exist: 


(2.15) E[c] = с. 
(2.16) E[cg()] = cE[g()]. 
(2.17) E[gi() + go(x)] = Elgi(x)] + Elg). 


(2.18) Ев(®]<Е (@ if gle) X g(x) for all x. 
(2.19) LELEN < Fllg@)ll- 


SEC. 2 EXPECTATION OF A FUNCTION 207 


In words, the first three of these properties may be stated as follows: 
the expectation of a constant c [that is, of the function g(x), which is equal 
to c for every value of 2] is equal to c; the expectation of the product of a 
constant and a function is equal to the constant multiplied by the expecta- 
tion of the function; the expectation of a function which is the sum of two 
functions is equal to the sum of the expectations of the two functions. 

Equations (2.15) to (2.19) are immediate consequences of the definition 
of expectation. We write out the details only for the case in which the 
expectations are taken with respect to a continuous probability law with 
probability density function /(-). Then, by the properties of integrals, 


Ele] = [ое а= ef” fle) dx = c, 
вео) = [^ egy fe) de m e [^ ert de = ert 
Elg) + go] -[ «o + golx)) f(a) de 
т Í NOE [se fle) de = Elgi) + Elge), 


El ga] EIN = | te) — s 1/0 de > 0. 


Equation (2.19) follows from (2.18), applied first with g,(x) = g(x) and 
gx) = |g(a)| and then with g(x) = —|g(@)| and ga(x) = s. 


p> Example 2B. To illustrate the use of (2.15) to (2.19), we note that 

E[4] = 4, Ela? — 4x] = Ef] — 4E[x], and E[(x — 2)] = Ef? — 4a + 4] 

= Е[а?] — 4E[«] + 4. 4 
We next derive an extremely important expression for the variance of a 

probability law: 

(2.20) o? = Е[(х — E[x]?] = Е[а2] — E?[w]. 


In words, the variance of a probability law is equal to its mean square, minus 
its square mean. To prove (2.20), we write, letting m = E[x], 
о? = Ep? — 2mx + nf] = E[x*] — 2mE[»] + n? 


= Ep — 2n? + n? = Eje] — т. 


208 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 
In the remainder of this section we compute the mean and variance of 
various probability laws. A tabulation of the results obtained is given in 
Tables 3A and 3B at the end of section 3. 
p> Example 2C. The Bernoulli probability law with parameter p, in which 
0 <p < l, is specified by the probability mass function p(), given by 
pO) = 1 — p, p(1) = p, pœ) = 0 for x 52 0 ог 1. Its mean, mean square, 
and variance, letting q = 1 — p, are given by 
(2.21) E[?] = 0-@+1%-р=р 
c? = Е[а?] — n? = p — p? = pq. « 
p Example 2D. The binomial probability law with parameters т and p 


is specified by the probability mass function given by (4.5) of Chapter 4. 
Its mean is given by 


(2.22) El] = Š kph) =Ż (i) pp 


z np Y п— 1 pigo-n-e-n = np(p Et 4)" = пр. 
ik- 1 
Its mean square is given by 
(2.23) Ep] = Se (2) p. 
K=0 
To evaluate Е[а?], we write k? = k(k — 1) + k. Then 


(2.24) Ef? = kk = Dh) рч)" + Еа. 
" n n—2 " А 
Since k(k — D(a =n(n— »( = 2): the sum in (2.24) is equal to 


n — 
a(n — Dp? Y | 3 p34 0779-62 = n(n — 1)р(р +g)". 
km — 2. 
Consequently, E[2?] = n(n — 1)p? + np, so that 


(2.25) E[?] = npq + mp, — o? = El?) — Еа] = npg. 4 


SEC. 2 EXPECTATION OF А FUNCTION 209 


p» Example 2E. The hypergeometric probability law with parameters 
N, n, and p is specified by the probability mass function p(-) given by (4.8) 
of Chapter 4. Its mean is given by 


(2.26) Е[®] = 0 = an Sk (9) (8 2 


7“ 


= PES (Eo 


(7) 


in which we have let а = Np, b = Ng. Now, letting j = k — 1 and using 
(2.37) of Chapter 4, the last sum written is equal to 


BCT Yet) C227) C29) 


ј=0 
Consequently, 
(52) 
= 1 
(2.27) E[x] = Np EE = пр. 
) 


Next, we evaluate E[z?] by first evaluating E[z(z — 1)] and then using the 
fact that Е[а?] = Ele — 1] + E[z]. Now 


(А) вве - 0 -iu- 891.59 
= аа — а | oe 


(2.28) = a(a — n(° Ex | = Np(Np — (7 e | 


= пр 
Ep] = np - Di н PO D*N-» 


N-n 
n 3 у= 
aay Р z0 — n + рми = 1) — pa(N — 1) = пт: 
geometric probability law is the same as 
] probability law, whereas the variances 
tely equal to 1 if the ratio n/N isa small 


Notice that the mean of the hyper| 
that of the corresponding binomia 
differ by a factor that is approxima 
number. 


210 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 


p> Example 2Е. The uniform probability law over the interval a to b has 
probability density function f(-) given by (4.10) of Chapter 4. Its mean, 
mean square, and variance are given by 


ud 1 b b—a@ b+a 
ga =| arm ae pul 5-4 3 


= 1 


b 
1 
„2/0 ах = ПЕ | zdy- 3 (D? + ba + a?) 


(2.29) Ex] -[ 


o = E[p?] — ЕЗ] = is(5 — ay. 


Note that the variance of the uniform probability law depends only on the 
length of the interval, whereas the mean is equal to the mid-point of the 
interval. The higher moments of the uniform probability law are also 


easily obtained: 
230 © 1 I d bri F^ qu 4 
(230) Pr icd B "= a+ DO — a)’ 


p> Example 2G. The Cauchy probability law with parameters ~ = 0 and 
В = 1 is specified by the probability density function 

(2.31) bie ЛИЕ. 

: fe- т1-+4?` 


The mean Е[2] of the Cauchy probability law does not exist, since 


1 [* 1 
(2.32) Elle] = = i apr = 9. 


However, for r < 1 the rth absolute moments 
2 33 r 1 ° r 1 la 
(2.33) ЕЦ] =-—]_ 1 i337 


do exist, as one may see by applying theoretical exercise 2.1. 4 


THEORETICAL EXERCISES 


2.1. Test for convergence or divergence of infinite series and improper integrals. 
Prove the following statements. Let A(x) be a continuous function. If, 
for some real number r > 1, the limits 


(2.34) lim a?|A(2)], lim |x|"|A(x)| 


2—0 


SEC. 2 EXPECTATION OF А FUNCTION 211 


2.2. 


2.3. 


2.4. 


2.5. 


both exist and are finite, then 

© eo 
(2.35) i h(x) ах, У a 

—® к=-® 
converge absolutely; if, for some r X 1, either of the limits in (2.34) 
exist and is not equal to 0, then the expressions in (2.35) fail to converge 
absolutely. 
Pareto's distribution with parameters r and А, in which r and А are 
positive, is defined by the probability density function 


1 
(2.36) /@) rA" Е fors >A 
= 0 for x < А. 
Show that Pareto's distribution-possesses a finite nth moment if and only 
ifn <r. Find the mean and variance of Pareto's distribution in the cases 
in which they exist. 
“Student’s” r-distribution with parameter > > 0 is defined as the con- 
tinuous probability law specified by the probability density function 
1 Ti + 0/2] ( аз} —@ +02 
2.37 = ST р " 
(за ft v»; T(r/2) 
Note that "Student's" r-distribution with parameter » = 1 coincides with 
the Cauchy probability law given by (2.31). Show that for "Student's" 
(i) the nth moment E[z"] exists only for 


t-distribution with parameter > Ез” › 
n < v, (ii) if n < v and л is odd, then E[z"] = 0, (iii) if n < v and n is 


y 


even, then 
- n/2 Tiin + 1)/2]Г[(> — 20/2] 
a Ерт] = "торго —— 


Hint: Use (2.41) апа (2.42) in Chapter 4. 
Consider a probability law with finite 


mean m. Define, for every real number a, h(a) = E[(« — a)*]. Show that 
h(a) = El — my? + (n — ay. Consequently /(a) is minimized at 
a — m, and its minimum value is the variance of the probability law. 


ical i i f a probability law. Show that 
А geometrical interpretation of the mean o го { f 
ae continuous probability law with probability density function f'C) 


and distribution function F(-) 


[а — F()] de -Í af dyf (y) -Í uf (y) dy, 
0 0 r 0 


0 z —o 
zl F(x) dv = -f af dyf (y) = -[ uf (y) dy. 


" an m of the probability law may be written 


A characterization of the mean. 


(2.39) 


Consequently the me 


о 0 
(240) m -[ Jf (y) dy -[ [I — F()] de -f Ri dr. 


These equations may be interpreted geometrically. Plot the graph 


212 


2.6. 


2.8. 


MEAN AND VARIANCE OF A PROBABILITY LAW cH. 5 


y = F(x) of the distribution function on an (=, y)-plane, as in Fig. 2A, 
and define the areas I and II as indicated: Iis the area to the right of the 
y-axis bounded by y = 1 and y = F(x); П is the area to the left of the 
y-axis bounded by y = 0 and у = F(x). Then the mean т is equal to 


>x 


Fig. 2A. The mean of a probability law with distribution function F(-) is equal 
to the shaded area to the right of the y-axis, minus the shaded area to the left 
of the y-axis. 


area I, minus area II. Although we have proved this assertion only for 
the case of a continuous probability law, it holds for any probability law. 


A geometrical interpretation of the higher moments, Show that the nth 
moment Е[2"] of a continuous probability law with distribution function 
Е(:) can be expressed for n = 1, 2,... 


© 


(241) Ее" -f arf (ж) de -fw m — FG) + (21? F(—y)). 
eo 0 


Use (2.41) to interpret the nth moment in terms of area. 


The relation between the moments and central moments of a probability 
law. Show that from a knowledge of the moments of a probability law 
one may obtain a knowledge of the central moments, and conversely. 
In particular, it is useful to have expressions for the first 4 central moments 
in terms of the moments. Show that 

El(e — Ep] = Ele] — 3EDIE[z?] + 23а] 

Е[(ж — E[z]^] = Ele’) — 4E[x]E[x?] + 6Е?[а]Е[х?] — ЗЕ]. 


The square mean is less than or equal to the mean square. Show that 
(2.43) [EE] x Elle|] < Ез). 


Give an example of a probability law whose mean square E[2?] is equal 
to its square mean. ' 


(2.42) 


SEC. 
2.9. 


2.10. 


2.11. 


2 EXPECTATION OF A FUNCTION 213 


The mean is not necessarily greater than or equal to the variance. The 
binomial and the Poisson are probability laws having the property that 
their mean m is greater than or equal to their variance c? (show this); 
this circumstance has sometimes led to the belief that for the probability 
law of a random variable assuming only nonnegative values it is always 
true that m > о. Prove this is not the case by showing that т < о? 
for the probability law of the number of failures up to the first success in 
а sequence of independent repeated Bernoulli trials. 


The median of a probability law. The mean of a probability law provides 
a measure of the “mid-point” of a probability distribution. Another such 
measure is provided by the median of a probability law, denoted by me, 
which is defined as a number such that 

< F(m, +0) = lim F(x). 


tm, 


Nie 


(2.44) lim F(x) = F(m, — 0) < 
r—m,— 
If the probability law is continuous, the median 7, may be defined as a 
Me 


number catistying | Ге) dv = l. Thus т, is the projection on the 
© \ 


x-axis of the point in the (x, y)-plane at which the line y = 1 intersects 
the curve y = F(x). A more probabilistic definition of the median m, 
is as a number such that P[X < m] < $ > РІХ > mj, in which X is 
an observed value of a random phenomenon obeying the given probability 
law. There may be an interval of points that satisfies (2.44); if this is 


the case, we take the mid-point of the interval as the median. Show that 
one may characterize the median m, as a number at which the function 
h(a) = ЕП — a|Jachieves its minimum value; this is therefore Е — mel]. 
Hint: Although the assertion is true in general, show it only for a con- 
tinuous probability law. Show, and use the fact, that for any number a 


m, 

(2.45) Ее — al] = Elle — ml] + af (= — a) f (2) de. 
of a continuous or discrete probability law. For a continuous 
probably law with probability density function f(x) a mode of the 
probability law is defined as a number то at which the probability density 
has a relative maximum; assuming that the probability density function 
is twice differentiable, a point m is а mode if f'Ono) = 0 and /"(то) < 0. 

function is the derivative of the distribution 


i ili it 
Since пери may be stated in terms of the distribution 
function: a point то is a mode if F (то) = Oand Р (т) < 0. Similarly, 
for a discrete probability law with probability mass function p(:) à m e 
of the probability law is defined as a number mgo at which the pro abi ity 
mass function has a relative maximum; more precisely, p(o) =e) 
for « equal to the largest probability mass point less than mg v oe 
equal to the smallest probability mass point larger than mo. A pro ability 

di id to be (i) unimodal if it possesses just 1 mode, (ii) bimodal ifi 
i еа d so on. Give examples of continuous and 


possess eae mo A are (a) unimodal, (b) bimodal. Give 


discre probao ue discrete probability laws for which the mean, 


mples of continuous and | i 
tels and mode (c) coincide, (d) are all different. 


214 MEAN AND VARIANCE OF А PROBABILITY LAW CH. 5 


2.12. The interquartile range of a probability law. Possible measures exist of 
the dispersion of a probability distribution, in addition to the variance, 
which one may consider (especially if the variance is infinite). The most 
important of these is the interquartile range of the probability law, defined 
as follows: for any number р, between 0 and 1, define the p percentile 
(р) of the probability law as the number satisfying Е(и(р) – 0) <р = 
F(u(p) + 0). Thus и(р) is the projection on the z-axis of the point in the 
Gv, y)-plane at which the line y = р intersects the curve y = F(x). The 
0.5 percentile is usually called the median. The interquartile range, 
defined as the difference (0.75) — (0.25), may be taken as a measure of 
the dispersion of the probability law. 


(i) Show that the ratio of the interquartile range to the standard deviation 
is (a), for the normal probability law with parameters m and o, 1.3490, 
(5), for the exponential probability law with parameter 4, log, 3 = 1.099, 
(c), for the uniform probability law over the interval a to b, V3. 

(ii) Show that the Cauchy probability law specified by the probability 
density function f(x) = [z(1 + x2) possesses neither a mean nor a 


variance. However, it possesses a median and an interquartile range given 
by m, = ud) = 0, uQ) — и) = 2. 


EXERCISES 


In exercises 2.1 to 2.7, compute the mean and variance of the probability law 


specified by the probability density function, probability mass function, Or 
distribution function given. 


24. (i) Гб) 22x for0-z-—1 
- elsewhere. 
(ii) f) = |а| for |x| <1 
=0 elsewhere. 
(ii) fe) = 8 ll <1, 
= elsewhere. 
2.2. (i) fe) =1—-|1 —a| foro <x <2 
=0 elsewhere. 
" 1 
(ii) /@) = EN for0 <v <1 
=0 elsewhere, 
, 1 qz? 
23. (i) @) = —= ( + a 
f тУ3 3 
- 2 qu -a 
Gi) (z) =—=(14+— 
m 7V3 ( J 


о = 8B (у. у 
М " T. 
in pu zl Е 3) 


SEC. 3 MOMENT-GENERATING FUNCTIONS 215 
; 1 -KEE 
2.4. (i x) = — 2 
(i) fe ae” 
(ii) Гб) = VQ[z)e- te forz > 0 
= 0 elsewhere. 
25. (i) pe) =4 for x =0 
= $ for x = 1, 
= 0 elsewhere. 
= 6) (2\*/1\°* 
(ii) pe) = (°) (3) () forz 0,1,*:5,6 
=0 elsewhere. 
8\/ 4 
Р MA 
(iii) pe = 15 for» 20,1,:::,6 
(6) 
=0 elsewhere. 
.6. (i) peo) =з 3 ог = 1,2, 
=0 otherwise. 
ох 
(ii) pe = er for x = 0, 1,2, ++: 
=0 otherwise. 
27. (і Е@) =0 fora 0 
@ Шә forüzszl 
zd for x > 1. 
u F(x) =0 fors <0 
(i) O forO <x <1 
zi fora > 1. 
2.8. C he means and variances of the probability laws obeyed by the 
empto e ae described in exercise 4.1 of Chapter 


numerical valued random phenomena 


4. 


the probability law, specified by the following 


2.9. For what values of ғ does ә 1 eC h с 
probability density function, possess (i) a finite mean, (ii) a finite variance: 
raj 
(ж) = =a v1 
rozge " 
= otherwise. 


3. MOMENT-GENERATING FUNCTIONS 


The evaluation of expectations requires the use of operations of summa- 
tion and integration for which completely routine methods are not 
available. We now discuss а method of evaluating the moments of а 
probability law, which, when available, requires the performance of only 


216 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 


one summation or integration, after which all the moments of the proba- 
bility law can be obtained by routine differentiation. 

The moment-generating function of a probability law is a function y(), 
defined for all real numbers ż by 


(3.1) wt) = Efe]. 
In words, 1(1) is the expectation of the exponential function е". 

In the case of a discrete probability law, specified by a probability mass 
function p(-), the moment-generating function is given by 


(3.2) y(t) = bi e' p(x). 


over all points z such 
that p(x) > 0 


In the case of a continuous probability law, specified by a probability 
density function /(-), the moment-generating function is given by 


(3.3) wt) =f ef (x) dr. 


Since, for fixed t, the integrand е“ is a positive function of x, it follows 
that y(t) is either finite or infinite. We say that a probability law possesses a 
moment-generating function if there exists a positive number T such that 
y(t) is finite for |z| < T. It may then be shown that all moments of the 
probability law exist and may be expressed in terms of the successive 
derivatives at ¢ = 0 of the moment-generating function [see (3.5) We 
have already shown that there are probability laws without finite means. 
Consequently, probability laws that do not possess moment-generating 
functions also exist. It may be seen in Chapter 9 that for every probability 
law one can define a function, called the characteristic function, that always 
exists and can be used as a moment-generating function to obtain those 
moments that do exist. 

If a moment-generating function (1) exists for || < T (for some T > 0), 
then one may form its successive derivatives by successively differentiating 
under the integral or summation sign. Consequently, we obtain 


1 д 
y) = = yt) = DE e = Еке] 


d? д 
"(гу = = tz 2otir 
yO = di y(t) DE хе | = E[a?et7]. 


(3.4) аҙ д Ж 
yO = т*9 = 52 айе! ] = Efe’e"] 


n 


d 
(nf) = = nota 
vox) qp WO = Ele]. 


SEC. 3 MOMENT-GENERATING FUNCTIONS 217 


Letting ¢ = 0, we obtain 


v'(0) = E[z] 
p'O) = Ee") 
(3.5) v 9X0) = Elz] 


рео) = Ele") 


If the moment-generating function y(t) is finite for |] < T (for some 
T > 0), it then possesses a power-series expansion (valid for |] < T). 


es e 
G9  w)-14 Eli ЕР ++ ERIT n. 
To prove (3.6), use the definition of y(r) and the fact that 
2 2 . n i ... 
(3.7) eS Pal Pe gp Tc "raus . 


In view of (3.6), if one can readily obtain the power-series expansion of 
y(t), then one can readily obtain the nth moment Е[х"] for any integer п, 
since E[«"] is the coefficient of /"/л! in the power-series expansion of p(t). 


> Example ЗА. The Bernoulli probability law with parameter p has a 
moment-generating function for —00 < t < oo. 

(3.8) y(t) = eg + e" p = pe 

with derivatives 

Е = y'(0) = p, 


yd) = ре", 
E[z?] = v'(0) = р. <q 


y'(t) = pe, 
> Example 3B. The binomial probability law with parameters л and p 
has a moment-generating function for —oo < t < 00. 


(3.9) 


n 


(3.10) y(t) = Sen) =, (i) (peg = (ре +q)", 


with derivatives 
y'(r) = пре'(ре* + 4 
(3.11) y'(t) = npe'(pe' + qy + n(n — Dp?e?'(pe! + 9)", 
El] = np + nln — Dp? = "pa + p" 


mt, — E] = v/(0) = пр, 


4 


CH. 


MEAN AND VARIANCE OF A PROBABILITY LAW 


218 


ƏSIMIIYJO 0- 


= «М.М, 
К ) 1“ = 1 о=4 
(2) du Won = = ттан = (a)d Nive үш 
( (д №) = огјәшоәләаќн 
osiM1ou1o 0= 
а а 
yo RC “Troman -jA А je 15450 
J л 
d d sup 3 
ба: = hi аі = E 2Р1 cd end = (®)а 0<4 ]eruroumq 9AneaoNr 
a d ASIALIOYIO 0- 
n I T=" pgbd= (xd ї=450 ошәшоәгу 
osi oto = 
Y r eupo-cr $e (w)d 0<7 055104 
E 
F ƏSIMIIYJO о= 15450 
bdu u ica due тү _ 
eutomm „мб dfa = (04 Tau ]eruourg 
oso 0= 
bd d 0-2 b= 
[=e d=(a)d 15450 пошәя 
Bka — [19 = 2° ne кы (2d uonoun sse Хипардолд ѕләјәшелед мет Ái]Iqeqoiq 


RUIN 


SNOILONN AJ ONILVHSNSf) ANV SINSWOJA UGH] ним SMVT ALIIIgVHONq G.L382SI([ аячямхпоомя ATLINS3003 T 3NOS 


Ve WISVIL 


MOMENT-GENERATING FUNCTIONS 219 


SEC. 3 


“LTI ‘d ‘gp61 *uopuo' UYO seyo ‘sousuvig fo Клод, paouvapy "I[epuaw ‘o "JA oos 


эдәшоәблә4^ н 


(Oq9-- DOq«4 + 
20:1: = 


(d + б)бал4= 


d Р 1-04 – 0) = 
zd ad d d Oo (mab-1 ab— | 
(5 (чє + 9) + Js (2 + ) ы | d- 0) = X 3 Ad ]eruoutq әлцеЗәм 
od КА d d nb — 1 Б ү 
wt Y Y (two? pr? wossiog 
(bdo — jyubd + „Б.и (d — b)bdu ub + məd) «(+ ,ad) enuourq 
(bdo — |)bd + bade (d — b)bd b+ məd b+ әй moug 
Ж ШЕ. 109 — ®)]я = [2]Я = (1)% 
ШШЩ. Шш ies зчәшор [зч = Q0 uonouny мет Aigeqorg 


—1 


телиәо patur, 


uonoun, onsrojoe1eu? 


Зицеләиәгу-уйәшорү 


SNOLLONQ ONLLVH3IN3D) аму SININOW WAHL Ним SMYT ALIMAVEOUd 31330510] AIAILNNODNJ ATIN3DO3N.] AWOS 


‘(panuwod) vg ATAVL 


CH. 


MEAN AND VARIANCE OF A PROBABILITY LAW 


220 


osi1oto 0= 
4 à е 
x z 0< = 2ү-ә1-10%) oe = (a) 0<4 ъшшео 
ad H osi19g)o 0= 
I I 0<% yy = (wf 0crY Tenuouodxq 
oco 
uc AD 
z2 ш ( А ү 20025 (f о > ш> о— ешюм 
\ш-—т/717— 1 
хада Ны = q 0) D үелзәуш 
La е д> =>о 2292 ws] о>д>о> бє JOAO шоли 
0 — 9) 9+0 I 
(9:9 ~ Їч = 2° ни ()/ uonounj Киѕиәс Aiiqeqoig ѕләјәшелед мет Ayiqeqorg 


aoueneA, 


SNOLLONQ. ONILVYANTO) ANV SINSAOJA WAHL НПА SMVT ALEIISVSONq SQONNILNOD GaYFLNNOONY ATLNINOAY ANOS 


Ht ATAVL 


MOMENT-GENERATING FUNCTIONS 221 


SEC. 3 


x 24 we 1) m 
+ 49 4C a= Mn PRI ъшшегу 
x & ( " 1) C E 1) E dd 
6 [^ eV CM ү Teruuouodxq 
we 0 зои — un)? 12:14 uy? ]euuoN 
жаш 0 Mi Mp (ois d q оу 0 үелләуш 
Y — 4) Dm? — quj? и? — q? JOAO unojium 
йя — = aq — alg = qa 
p promi d lnla = Qn? pomme i мет Куүаедоз, 
uonouny оп$иәўәюлецгу mound "| ANtqeqoxq 


penu qno 


Teu) pitur 


Sunviouay-juswoyy 


SNOILLONA{ ONILVYINID аму SINSWOJA WAHL НПА SMV] ALITIavaoug SMNONNILNOD Q3U31Nn0ONq ATLNANDAY J aWOS 


(panunuo)) ge AIAVL 


222 MEAN AND VARIANCE OF А PROBABILITY LAW CH. 5 


p> Example ЗС. The Poisson probability law with parameter 2 has a 
moment-generating function for t such that Де < 1. 

$ ов ase а Aet—1) 

(312) (0) = Xe"p(k) =e т meg =e , 


K-o 

with derivatives 

vO = e*-»e, Ер] = y(0) = 2, 
CD yy = eee yen Брај уусаад 


Consequently, the variance o? = E[z2] — Еа] = 2. Thus for the Poisson 
probability law the mean and the variance are equal. <q 


p> Example 3D. The geometric probability law with parameter p has a 
moment-generating function for t such that ge « 1. 


(3.14) 


1) = Ўе®щ = pet S 26-1 = pet . 
y?) = Хейр) = p 2 qe = p ет”, 

Егот (3.14) опе may show that the mean and variance of the geometric 
probability law are given by 


(3.15) m= E =}, o = Еа] — Е?[х] = 4. : <q 
P P 
> Example 3E. The normal 


probability law with mean m and variance c 
hasa moment-generating fun 


ction for —00 < t < со. 


1 (9 иту 
3.16 = — s e 
649 = IN e 


TO 


— ё tuo убу 
dx = е om eve dy 
RE 


© 
4 | e Vu ot dy = gogo 


V2 
From (3.16) one may show 
probability law are given by 


(3.17) E[(y — т)" =0 


-0 
that the central moments of the normal 


ifn = 3,5,..., 


—1:3:5-- (n — g^ ifn =2,4,---, 


An alternate method of deriving (3.17) is by use of (2.22) in Chapter 4. <q 


p> Example ЗЕ. The exponential probability law with parameter 4 has a 
moment-generating function for t < 2. 


(3.18) y(t) = af ee dy = A = (: = jo ч 
0 A-t 


SEC. 3 MOMENT-GENERATING FUNCTIONS 223 


One may show from (3.18) that for the exponential probability law the mean 
m and the standard deviation o are equal, and are equal to the reciprocal of 
the parameter 2. <q 


> Example 3G. The lifetime of a radioactive atom. It is shown in section 
4 of Chapter 6 that the time between emissions of particles by a radioactive 
atom obeys an exponential probability law with parameter 4. By example 
3F, the mean time between emissions is 1/2. The time between emissions 
is called the lifetime of the atom. The halflife of the atom is defined as the 
time T such that the probability is } that the lifetime will be greater than T. 
Since the probability that the lifetime will be greater than a given number 
t is e~*, it follows that T is the solution of e^" = 4, or Т = log, 2/4. In 
words, the half life T is equal to the mean 1/A multiplied bylog,2. <q 


THEORETICAL EXERCISES 


3.1. Generating function of moments about a point. Define the moment-generating 


function of a probability law about a point c as a function y,() defined for 
all real numbers / by c(t) = E[e 7-9]. Show that y(r) may be obtained 


from y(t) by y(t) = e-*'y(t). The nth moment E[(x — c)"] of the proba- 

bility law eat the point c is given by Е — ¢)"] = y?(0) and may be 

read off as the coefficient of /"/n! in the power-series expansion of w(t). 
3.2. The factorial moment-generating function. Ф(и) of a probability law is 
defined for all u such that |u| < 1 by 


Ф(и) = ЕКІ + uy]- Е[е190 +] = pllog (1 + и)]. 


Its nth derivative evaluated at u = 0 
m0) = Efe(@ — D °° (а —п+1)] 


al moment of the probability law. From a knowledge 
moments of a probability law one may obtain a 
s of the probability law, and conversely. 


is called the nth factori 
of the first factorial 
knowledge of the first » moment 
Thus, for example, 
(3.19) Efe — 1)] = Efe? — El), Ele?) = Ele(e — 1)] + Ele). 
i implici i i i d moments 
Equation (3.19) was implicitly used in calculating certain secon me 
and noted J einn d. haw that the first n moments of two distinct 
probability laws coincide if and only if their first п factorial moments 
Coincide. Hint: Consult M. Kendall, The Advanced Theory of Statistics, 
Vol. I, Griffin, London, 1948, p. 58. 
3.3. The factorial moment-generating function of the probability law of Ше 
number of matches in the matching problem. The number of matches 
obtained by distributing, 1 to an urn, M balls, numbered 1 to M, among 


224 MEAN AND VARIANCE OF A PROBABILITY LAW cH. 5 


M urns, numbered 1 to M, has a probability law specified by the probability 
mass function 


m A "S pl 0, 1, 2; M 
= —1)* £0, 1: 2,5, 
(3.20) pim) = = P ) т 

=0 otherwise. 


Show that the corresponding moment-generating function may be written 


M M 1 
(3.21) y) = > epi) - Pe (e — 1y. 


Consequently the factorial moment-generating function of the number of 
matches may be written 


M 
(3.22) Фи) = > L, 
r=0"; 


3.4. The first M moments of the number of matches in the problem of matching 


M balls in M urns coincide with the first M moments of the Poisson proba- 
bility law with parameter 4 = 1. Show that the factorial moment-generating 
function of the Poisson law with parameter 2 is given by 


© or 
(3.23) Oplu) = => E y. 


r-0 


By comparing (3.22) and (3.23), it follows that the first M factorial moments, 


and, consequently, the first M moments of the probability law of the 


number of matches and the Poisson probability law with parameter 1, 
coincide, 


EXERCISES 


Compute the moment generating function, mean, and variance of the pro- 
bability law specified Ы 


y the probability density function, probability mass 
function, or distribution function given. 


34. (i) fee fors > 0 
| -0 elsewhere. 
Gi) fG) = e-e-9 for x > 5 
=0 elsewhere. 
3.2. (i) KOs р" reso 
275 
р = 0 elsewhere. 
Gi) f) = fre? forz >0 
=0 elsewhere. 
33. (i) P(x) = $0) fora 21,2 --- 
-0 elsewhere. 
Gi) p) = ex fors —0,1,-.- 


elsewhere. 


= 


SEC. 4 CHEBYSHEV'S INEQUALITY 225 
3.4. @) FG) = (=) 
(i) F(x) =0 forz <0 
2]1-—e^5 for x 2 0. 


3.5. Find the mean, variance, third central moment, and fourth central moment 
of the number of matches when (i) 4 balls are distributed in 4 urns, 1 to 
an urn, (ii) 3 balls are distributed in 3 urns, 1 to an urn. 


3.6. Find the factorial moment-generating function of the (i) binomial, (ii) 
Poisson, (iii) geometric probability laws and use it to obtain their means, 
variances, and third and fourth central moments. 


4. CHEBYSHEV'S INEQUALITY 


and variance of a probability law one 
bability law. In the circumstance that 
wis known up to several unspecified 
law may be assumed to be a normal 
it is often possible to relate the 
One may then use a knowledge of 
he probability law. In the case in 
bility law is unknown one can 


From a knowledge of the mean 
cannot in general determine the pro 
the functional form of the probability la 
parameters (for example, a probability 
distribution with parameters m and o), 
parameters and the mean and variance. 
the mean and variance to determine t 
Which the functional form of the proba 
Obtain crude estimates of the probability law, which suffice for many 
Purposes, from a knowledge of the mean and variance. 

For any probability law with finite mean 7” and finite variance o?, define 
the quantity Q(A), for any h > 0, as the probability assigned to the interval 
t: m — ho < x < m + ho} by the probability law. In terms of a 
distribution function F(+) or a probability density function /(-), 

та 


(4.1) ОФ) = Ет + ho) — Кт — ho) = BEOL 


Let us compute О(Л) in certain cases. For the normal probability law 


with mean m and standard deviation с 


(4.2) PRR ("=") dy = Ф) — Ф). 
00 V270 h j 
тт! т — Аа 


For the exponential law with mean 1/4 


(4.3) QU) = ee" — e?) forh<1 


m erem for Л > 1. 


226 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 


For the uniform distribution over the interval a to b, for h < МЗ, 


bta , „b-a 
2 Уз? 
Р 1 | d h 
= — x = —. 
(4.4) Q(h) NE V3 
ite у%—# 
2 У15 


For the other frequently encountered probability laws one cannot so 
readily evaluate Q(A). Nevertheless, the function Q(A) is still of interest, 
since it is possible to obtain a lower bound for it, which does not depend 
on the probability law under consideration. This lower bound, known as 
Chebyshev's inequality, was named after the great Russian probabilist 
P. L. Chebyshev (1821-1894). 


Chebyshev's inequality. For any distribution function F(-) and any 
hzo 


(4.5) Oh) = F(m + ho) — F(m — ho) > 1 — = 


Note that (4.5) is trivially true for л < 1, since the right-hand side is then 
negative. 

We prove (4.5) for the case of a continuous probability law with prob- 
ability density function /(-). It may be proved in a similar manner (using 
Stieltjes integrals, introduced in section 6) for a general distribution 
function. The inequality (4.5) may be written in the continuous case 

m ho 1 


(4.6) Ја) а= > 1 — — 


m—ho p 


To prove (4.6), we first obtain the inequality 


m-—ha © 
(4.7) а? >f (= — т) (х) dx «[ (a — т) (а) dx 
-0 m+he 


that follows, since the variance 0° is equal to the sum of the two inte- 

grals on the right-hand side of (4.7), plus the nonnegative quantity 
т ho 

| (= — mff(x)dx. Now for x < m — ho, it holds that (x — m} > 
m-—ha 

h?o?. Similarly, æ > m + ho implies (ж — m)? > h?o?. By replacing 

(= — m} by these lower bounds in (4.7), we obtain 


(4.8) 92 о ie /@&+| f «|. 


m-ho 


The sum of the two integrals in (4.8) is equal to 1 — Q(/). Therefore 
(4.8) implies that 1 — Q(h) < (1/42), and (4.5) is proved. 


SEC. 4 CHEBYSHEV'S INEQUALITY 221 


In Fig. 4A the function Q(h), given by (4.2), (4.3), and (4.4), and the 
lower bound for Q(/), given by Chebyshev's inequality, are plotted. 

In terms of the observed value X of a numerical valued random pheno- 
menon, Chebyshev’s inequality may be reformulated as follows. The 
quantity Q(h) is then essentially equal to P[|X — m| < hc]; in words, 
Q(h) is equal to the probability that an observed value of a numerical 
valued random phenomenon, with distribution function F(-), will lie in an 


Normal distribution 
Chebyshev's lower bound for Q(h) 


Exponential distribution 


Fig. 4A. Graphs of the function Q(/)). 


interval centered at the mean and of length 2h standard deviations. 
Chebyshev's inequality may be reformulated: for any A > 0 

(49) PIX — m| < ^e] X 1 — E РХ — m| > ho] < а ! 

uality (with h = 4) states that the probability is at 
observed value X will lie within four standard devia- 
tions of the mean, whereas the probability is at least 0.99 that an observed 
value X will lie within ten standard deviations of the mean. Thus, in terms 
of the standard deviation о (and consequently in terms of the variance 0°), 
we can state intervals in which, with very high probability, an observed 
value of a numerical valued random phenomenon may be expected to lie. 
It may be remarked that it is this fact that renders the variance a measure 
of the spread, or dispersion, of the probability mass that a probability law 
distributes over the real line. 


Chebyshev’s ineq 
least 0.9375 that an 


228 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 


Generalizations of Chebyshev's inequality. As a practical tool for using 
the lower-order moments of a probability law for obtaining inequalities 
on its distribution function, Chebyshev's inequality can be improved upon 
if various additional facts about the distribution function are known. 
Expository surveys of various generalizations of Chebyshev's inequality 
are given by Н. J. Godwin, “On generalizations of Tchebychef's inequality," 
Journal of the American Statistical Association, Vol. 50 (1955), pp. 923-945, 
and by C. L. Mallows, “Generalizations of Tchebycheff’s inequalities,” 
Journal of the Royal Statistical Society, Series B, Vol. 18 (1956), pp. 139- 
176 (with discussion). 


EXERCISES 


4.1. Use Chebyshev's inequality to determine how many times a fair coin must 
be tossed in order that the probability will be at least 0.90 that the ratio 
of the observed number of heads to the number of tosses will lie between 
0.4 and 0.6. 

4.2. Suppose that the number of airplanes arriving at a certain airport in any 
20-minute period obeys a Poisson probability law with mean 100. Use 
Chebyshev's inequality to determine a lower bound for the probability 
that the number of airplanes arriving in a given 20-minute period will be 
between 80 and 120. 

4.3. Consider a group of N men playing the game of “odd man out" (that is, 
they repeatedly perform the experiment in which each man independently 
tosses a fair coin until there- is an “odd” man, in the sense that either 
exactly 1 of the N coins falls heads or exactly 1 of the N coins falls tails). 
Find, for (i) N = 4, (ii) N = 8, the exact probability that the number of 
repetitions of the experiment required to conclude the game will be within 
2 standard deviations of the mean number of repetitions required to con- 
clude the game. Compare your answer with the lower bound given by 
Chebyshev’s inequality. 

4.4. For Pareto's distribution, defined in theoretical exercise 2.2, compute and 
graph the function Q(h), for 4 = 1 andr = 3 and 4, and compare it with 
the lower bound given by Chebyshev’s inequality. 


5. THE LAW OF LARGE NUMBERS FOR INDEPENDENT 
REPEATED BERNOULLI TRIALS 


Consider an experiment with two possible outcomes, denoted by success 
and failure. Suppose, however, that the probability p of success at each 
trial is unknown. According to the frequency interpretation of probability, 
p represents the relative frequency of successes in an indefinitely prolonged 
series of trials. Consequently, one might think that in order to determine 
p one must only perform a long series of trials and take as the value ofp 
the observed relative frequency of success. The question arises: can one 


SEC. 5 THE LAW OF LARGE NUMBERS 229 


justify this procedure, not by appealing to the frequency interpretation of 
probability theory, but by appealing to the mathematical theory of 
probability ? 

The mathematical theory of probability is a logical construct, consisting 
of conclusions logically deduced from the axioms of probability theory. 
These conclusions are applicable to the world of real experience in the 
sense that they are conclusions about real phenomena, which are assumed 
to satisfy the axioms. We now show that one can reach a conclusion within 
the mathematical theory of probability that may be interpreted to justify 
the frequency interpretation of probability (and consequently may be used 
to justify the procedure described for estimating p). This result is known 
as the law of large numbers, since it applies to the outcome of a large 
number of trials. The law of large numbers we are about to investigate 
may be considerably generalized. Consequently, the version to be discussed 
is called the Bernoulli law of large numbers, as it was first discovered by 
Jacob Bernoulli and published in his posthumous book Ars conjectandi 
(1713). 

The Bernoulli Law of Large Numbers. Let S, be the observed number 
of successes in л independent repeated Bernoulli trials, with probability p 


of success at each trial. Let 


(5.1) i= 


denote the relative frequency of successes in the n trials. Then, for any 
Positive number e, no matter how small, it follows that 


(5.2) lim РЦ, — pl < d = 1, 
(5.3) lim P[]f, — pl > d = 0. 


In words, (5.2) and (5.3) state that as the number л of trials tends to 
infinity the relative frequency of successes in z trials tends to the true 
Probability p of success at each trial, in the probabilistic sense that any 
nonzero difference є between /„ and p becomes less and less probable of 


Observation as the number of trials is increased indefinitely. 
Bernoulli proved (5.3) by a tedious evaluation of the probability in (5.3). 


Using Chebyshev's inequality, one can give a very simple proof of (5.3). 
By using the fact that the probability law of 5, has mean np and variance 
"pq, one may prove that the probability law of f, has mean p and variance 
[p(1 — р)]п. Consequently, for any є 20 


1 — p) 
(5.4) ву, А> 405". 


230 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 
Now, for any value of p in the interval 0 < p < 1 
(5.5) РО = р) = 


using the fact that 4p(1 — p) — 1 = —Qp — 1)? < 0. Consequently, for 
anye- 0 


(5.6) Plh – рі > «< 


7 4ne 


—0 as n—> о, 


no matter what the true value of p. To prove (5.2), one uses (5.3) and the 
fact that 


(5.7) P[|f, = Pl S «= 1— Ру, = рі > є. 


Itis shown in section 5 of Chapter 8 that the foregoing method of proof, 
using Chebyshev's inequality, permits one to prove that if №, X», ..., 
Хһ,... is a sequence of independent observations of a numerical valued 
random phenomenon whose probability law has mean m then for any 


є> 0 
XX qe Ж 
(5.8) Р, > «|0 
—› co 


The result given by (5.8) is known as the law of large numbers. 

The Bernoulli law of large numbers states that to estimate the unknown 
value of p, as an estimate of p, the observed relative frequency f, of 
successes in л trials can be employed; this estimate becomes perfectly 
correct as the number of trials becomes infinitely large. In practice, a 
finite number of trials is performed. Consequently, the number of trials 
must be determined, in order that, with high probability, the observed 
relative frequency be within a preassigned distance e from p. In symbols, 
to any number 2 one desires to find z so that 


(5.9) Pf, = pl < €| pP] 2 « for all pin0 <p <1 


where we write P[- | p] to indicate that the probability is being calculated 
under the assumption that p is the true probability of success at cach trial. 

One may obtain an expression for the value of n that satisfies (5.9) by 
means of Chebyshev's inequality. Since 


(5.10) P[|f,—plxem1-— for all pin0 <p x 1, 


4пе? 


it follows that (5.9) is satisfied if л is chosen so that 


І 
>-——. 
(5.11) блаап ә 


SEC. 5 THE LAW OF LARGE NUMBERS 231 


p» Example 5А. How many trials of an experiment with two outcomes, 
called А and B, should be performed in order that the probability be 95 75 
or better that the observed relative frequency of occurrences of 4 will 
differ from the probability p of occurrence of А by no more than 0.02? 
Here ж = 0.95, є = 0.02. Therefore, the number п of trials should be 


chosen so that n > 12,500. 

The estimate of n given by (5.11) can be improved upon. In section 2 of 
Chapter 6 we prove the normal approximation to the binomial law. In 
particular, it is shown that if p is the probability of success at each trial 
then the number S, of successes іп л independent repeated Bernoulli 
trials approximately satisfies, for any / > 0, 


(5.12) 7 = ‚| = 20(h) — 1. 
м. npq 


Consequently, the relative frequency of successes satisfies, for any e — 0, 


(5.13) P[f,—-p <= 2Ф(еМ/ трд) — 1. 


To obtain (5.13) from (5.12), let Л = «Упр. 
Define К(о) as the solution of the equation 


K(x) 
(5.14) 2Ф(К(о)) — 1 -[ : #0 dy = а. 
-к@ 
A table of selected values of K() is given in Table 5A. 
TABLE 5A 
a K(«) 
a 
0.50 0.675 
0.6827 1,000 
0.90 1.645 
0.95 1.960 
0.9546 2.000 
0.99 2.576 
0.9973 3.000 
ee EL 


From (5.13) we may obtain the conclusion that 
(5.15) Pll == pzdze if eV (рд) > K(a). 


м (тра) > KG) implies that the right-hand 


To justi te that € d 
ee Acor enm he left-hand side of (5.14). 


Side of (5.13) is greater than tl 


232 MEAN AND VARIANCE OF A PROBABILITY LAW cH. 5 


Since pq < (4) for all p, we finally obtain from (5.15) that (5.9) will 
hold if 


(5.16) nz-——- 


p> Example 5B. If « = 0.95 and є = 0.02, then according to (5.16) n 
should be chosen so that л — 2500. Thus the number of trials required 
for f, to be within 0.02 of p with probability greater than 95 % is approxi- 
mately 2500, which is of the number of trials that Chebyshev’s inequality 
states is required. 4 


EXERCISES 


5.1. A sample is taken to find the proportion р of smokers in a certain popu- 
lation. Find a sample size so that the probability is (i) 0.95 or better, 
(ii) 0.99 or better that the observed proportion of smokers will differ 
from the true proportion of smokers by less than (a) 1%, (b) 10%. 


5.2. Consider an urn that contains 10 balls numbered 0 to 9, each of which is 
equally likely to be drawn; thus choosing a ball from the urn is equivalent 
to choosing a number 0 to 9; this experiment is sometimes described by 
saying a random digit has been chosen. Let n balls be chosen with replace- 
ment. 


(i) What does the law of large numbers tell you about occurrences of 9's in 
the л drawings. 
(ii) How many drawings must be made in order that, with probability 0.95 


or better, the relative frequency of occurrence of 9's will be between 0.09 
and 0.11? 


5.3. If you wish to estimate the proportion of engineers and scientists who have 
studied probability theory and you wish your estimate to be correct, 
within 2%, with probability 0.95 or better, how large a sample should you 
take (i) if you feel confident that the true proportion is less than 0.2, (ii) if 
you have no idea what the true proportion is. 


5.4. The law of large numbers, in popular terminology, is called the law of 
averages. Comment on the following advice. When you toss a fair coin 
to decide a bet, let your companion do the calling. “Heads” is called 7 
times out of 10. The simple law of averages gives the man who listens a 
tremendous advantage. 


6. MORE ABOUT EXPECTATION 


In this section we define the expectation of a function with respect to 
(i) a probability law specified by its distribution function, and (ii) a 
numerical n-tuple valued random phenomenon. 


SEC. 6 MORE ABOUT EXPECTATION 233 


Stieltjes Integral. In section 2 we defined the expectation of a continuous 
function g(x) with respect to a probability law, which is specified by a 
probability mass function or by a probability density function. We now 
consider the case of a general probability law, which is specified by its 
distribution function F(-). 

In order to define the expectation with respect to a probability law 
specified by a distribution function F(-), we require a generalization of the 
notion of integral, which goes under the name of the Stieltjes integral. 
Given a continuous function g(x), a distribution function F(-), and a half- 
open interval (a, b] on the real line (that is, (a, b] consists of all the points 
Strictly greater than a and less than or equal to 5), we define the Stieltjes 


integral of g(), with respect to F(-) over (a, b], written [ g(x) dF(), as 
+ 


a 
follows. We start with a partition of the interval (a, b] into л subintervals 
(i v], in which zo, tı ..., v, are (п + 1) points chosen so that 
а= 2%) <a, <... < m, = b. Ме then choose a set of points CM EREE 


z,', опе in each subinterval, so that 2,1 < a, <2, fori=1,2,..., n 
We define 

b n А 
(6.1) | g(a) dF(2) = limit $ ge NFE) — Fe) 

ak no i=l 


in which the limit is taken over all partitions of the interval (a, b], as the 


maximum length of subinterval in the partition tends to 0. | | 
It may be shown that if F() is specified by a probability density function 


JC), then 


b b 
(62) [оао = [sore ae 
at в 
Whereas if F(-) is specified by a probability mass function p(-) then 
b 
$4 i ge) Ка) an ы апы 0Р0: 
we a<z<band p(z) >0 


g(-), with respect to the 


ieltjes i 1 of the continuous function 
The Stieltjes integral o buen. 


distribution function FC) over the whole real line, 
b 
Ё = li | (x) dF (x). 
(6.4) ji sO dF(x) ae a 
Р hir 


i i he existence and finiteness of 

The discussion in section 2 in regard to the exist 

integrals over the real line applies also to Stieltjes о. We say that 
F (x) аЕ(х) exists if and only icf |g(x)| dF(z) is finite. Thus only 

absolutely convergent Stieltjes integrals are to be invested with sense. 


234 MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 


We now define the expectation of a continuous function g(-), with respect 
to a probability law specified by a distribution function F(-), as the Stieltjes 
integral of g(-), with respect to F(-) over the infinite real line; in symbols, 


(6.5) Е[2(2)] -[ g(x) dF(z). 


Stieltjes integrals are only of theoretical interest. They provide a 
compact way of defining, and working with, the properties of expectation. 
In practice, one evaluates a Stieltjes integral by breaking it up into a sum 
of an ordinary integral and an ordinary summation by means of the 
following theorem: if there exists a probability density function /(:), а 
probability mass function p(-), and constants c, and cs, whose sum is 1, 
such that for every x 


(6.6) F(z) = af f(x) dz’ + c, P px). 


“over all 2” Sasuch 
that pr) >0 


then for any continuous function 0) 


(6.7) T g(x) dF(x) = al B(x) f(x) de +c, У gæpe) 


over all z such 
that р(х) 20 


In giving the proofs of various propositions about probability laws we 
most often confine ourselves to the case in which the probability law is 
specified by a probability density function, for here we may employ only 
ordinary integrals. However, the properties of Stieltjes integrals are very 
much the same as those of ordinary Riemann integrals; consequently, the 
proofs we give are immediately translatable into proofs of the general case 
that require the use of Stieltjes integrals. 


Expectations with Respect to Numerical n-Tuple Valued Random 
Phenomena. The foregoing ideas extend immediately to a numerical 
n-tuple valued random phenomenon. Given the distribution function 
F(r,, 25,...,,) of such a random phenomenon and any continuous 
function g(z;, .. . , z,) of n real variables, we define the expectation of the 
function with respect to the random phenomenon by 


ЫК Гав, с.) аке ms? 
п; 


in which the integral is a Stieltjes integral over the space R, of all n-tuples 
(Eg, A x,) of real numbers. We shall not write out here the definition 
of this integral. 


SEC. 6 MORE ABOUT EXPECTATION 235 


We note that (6.2) and (6.3) generalize. If the distribution function 
F(a, 23, ...,%,) is specified by a probability density function f (25, ts, . +- Cn) 
so that (7.7') of Chapter 4 holds, then 


(6.9) Elg(z, 25 7*2] 


© o pæ 
=|" N ` J sev Toas m) fecit , tp) du, dz * * * dën 


= 
n integrals 


If the distribution function F(x, tə- - - » n) is specified by a probability 
mass function р(х, 2», . - + » tn), 50 that (7.8) of Chapter 4 holds, then 


(6.10) Elg(, te"> 2,)] 
= (01,25,7775 EPEn ost ‚®„). 


over all (тү.л, т) 
such that рб) > 0 


EXERCISES 


variance, and moment-generating function of each of 
ecified by the following distribution functions. 
largest integer less than or equal to =.) 


6.1. Compute the mean, 
the probability laws sp 
(Recall that [2] denotes the 


(i) F(x) =0 fora <0 
-1-— je- 418) = Ze [2/3] forz > 0. 

(ii) F(x) =0 fors <0 
= e2 fz] 2 Pond 

-sfe d +> 2 ki ог > 0. 

(iii) F(x) =0 for x <1 

1 
=1 -z 34 fors 2 1. 
(іу) F(a) = 0 forx < 1 
2 1 
-1-3;- 39 for x > 1. 


6.2. Compute the expectation of the function Elči T2) = ym, with respect to 
the probability laws of the numerical 2-tuple valued random phenomenon 
specified by the following probability density functions or probability mass 


functions: 

9 уез) = exp (2al — 2180 

Gi) fet) = m ifa, <a <b, and а <t Sh 
5 р 172 2 


=0 otherwise. 


236 


ii) 


Gv) 


(v) 


(vi) 


MEAN AND VARIANCE OF A PROBABILITY LAW CH. 5 


fx) = AVI exp [—(@® + 252 + 2рада„)/2(1 — p2)] 

in which |p| — 1. 
PCp%) = yg ife, = 1,2,3,---,6 апа ж —1,2,::4,6 
0 otherwise. 


(узд 6 2\ %1 +22 /]Y 12-2, —7, 
PH, 23) a Ыы (3) (3) for x, and 2, equal 


to nonnegative integers 


otherwise. 


=0 
diate А 2125 m za Jana. aeee for w, and 


2, equal to nonnegative integers 
=0 otherwise, 


CHAPTER б 


Normal, Poisson, and 
Related Probability Laws 


In applied probability theory the binomial, normal, and Poisson 
probability laws play a central role. In this chapter we discuss the reasons 
for the importance of the normal and Poisson probability laws. 


1. THE IMPORTANCE OF THE NORMAL PROBABILITY LAW 


The normal distribution function and the normal probability laws have 
played a significant role in probability theory since the early eighteenth 
century, and it is important to understand from what this signifiance 
derives. 

To begin with, there are random phenomena that obey a normal 
probability law precisely. One example of such a phenomenon is the 
velocity in any given direction of a molecule (with mass M) in a gas at 
absolute temperature T (which, according to Maxwell’s law of velocities, 
obeys a normal probability law with parameters mc 0 and o? = M/KT, 
where К is the physical constant called Boltzmann's constant). However, 
n of certain physical phenomena, there are not many 
obeya normal probability law precisely. Rather, 
laws derive their importance from the fact that 
s they closely approximate many other probability 


with the exceptio: 
random phenomena that 
the normal probability 
under various condition 


laws. 
237 


238 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH. 6 


The normal distribution function was first encountered (in the work of 
de Moivre, 1733) as a means of giving an approximate evaluation of the 
distribution function of the binomial probability law with parameters 
n and p for large values of n. This fact is a special case of the famed 
central limit theorem of probability theory (discussed in Chapters 8 
and 10) which describes a very general class of random phenomena whose 
distribution functions may be approximated by the normal distribution 
function. 

A normal probability law has many properties that make it easy to 
manipulate. Consequently, for mathematical convenience one may often, 
in practice, assume that a random phenomenon obeys a normal probability 
law if its true probability law is specified by a probability density function 
of a shape similar to that of the normal density function, in the sense that 
it possesses a single peak about which it is approximately symmetrical. 
For example, the height of a human being appears to obey a probability 
law possessing an approximately bell-shaped probability density function. 
Consequently, one might assume that this quantity obeys a normal 
probability law in certain respects. However, care must be taken in using 
this approximation; for example, it is conceivable for a normally distri- 
buted random quantity to take values between — 109 and — 10200, although 
the probability of its doing so may be exceedingly small. On the other 
hand, no man's height can assume such a large negative value. In this 
sense, it is incorrect to state that a man's height is approximately distributed 
in accordance with a normal probability law. One may, nevertheless, 
insist on regarding а man's height as obeying approximately a normal 
probability law, in order to take advantage of the computational simplicity 
of the normal distribution. As long as the justification of this approxima- 
tion is kept clearly in mind, there does not seem to be too much danger in 
employing it. 

There is another sense in which a random phenomenon may approxi- 
mately obey a normal probability law. It may happen that the random 
phenomenon, which as measured does not obey a normal probability law, 
can, by a numerical transformation of the measurement, be cast into à 
random phenomenon that does obey a normal probability law. For 
example, the cube root of the weight of an animal may obey a normal 
probability law (since the cube root of weight may be proportional to 
height) in a case in which the weight does not. 

Finally, the study of the normal density function is important even for 
the study of a random phenomenon that does not Obey a normal probability 
law, for under certain conditions its probability density function may be 
expanded in an infinite series of functions whose terms involve the 
successive derivatives of the normal density function. 


SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 239 


2. THE APPROXIMATION OF THE BINOMIAL PROBABILITY 
LAW BY THE NORMAL AND POISSON 
PROBABILITY LAWS 


Some understanding of the kinds of random phenomena that obey the 
normal probability law can be obtained by examining the manner in which 
the normal density function and the normal distribution function first arose 
in probability theory as means of approximately evaluating probabilities 
associated with the binomial probability law. 

The following theorem was stated by de Moivre in 1733 for the case 
р = i and proved for arbitrary values of p by Laplace in 1812. 

The probability that a random phenomenon obeying the binomial probability 
law with parameters n and p will have an observed value lying between a and b, 
inclusive, for any two integers a and b, is given approximately by 


b—nptls 
Упра 
(2.1) х (5 E m 
к=а\К, Р Vm ja. 
Упра 
о) eut 
V/npq Упра 


Before indicating the proof of this theorem, let us explain its meaning 
and usefulness by the following examples. 


P» Example 2A. Suppose that n = 6000 tosses of a fair die are made. 
The probability that exactly k of the tosses will result in a "three" is given 
by agi (Ө) (2) TS The probability that the number of tosses on 
k 6/ 46 К Np 
which a “three” will occur is between 980 and 1030, inclusive, is given by 


the sum 
1030 iW (4) (2) 6000—k 
in A к /\6/ A6 ý 
It is clearly quite laborious to evaluate this sum directly. Fortunately, by 
(2.1), the sum in (2.2) is approximately equal to 


1030—1000 +14 
28.87 
1. e 14^ dy = Ф(1.06) — Ф(—0.74) = 0.313. 
зл 980—1000 — 14 <q 
98.87 


240 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 


po 
0.100} n = 100 


0.050 F 


0.000 >x 
10 20 30 40 50 60 70 80 90 


4 P(x) 
0.100 


0.050 - 


0.000 


10 20 30 40 50 60 70 80 90 


Fig. 2A. Graphs of the binomial probability mass function p(x) 
for p = + and n = 50 and 100, 


|р) 
i 
010} 


n = 100 


E 
s 


n=50 


| 

L 
=s 
F 
| 


o+ 

= 
№ 
w 

> 


Fig. 2B. Graphs of the probability mass function P*(h) for p = 3 and n = 50 and 100. 


SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 241 


P» Example 2B. In 40,000 independent tosses of a coin heads appeared 
20,400 times. Find the probability that if the coin were fair one would 
Observe in 40,000 independent tosses (1) 20,400 or more heads, (ii) between 
19,600 and 20,400 heads. 

Solution: Let X be the number of heads in 40,000 independent tosses of 
a fair coin. Then X obeys a binomial probability law with mean 
m = np = 20,000, variance о? = npg = 10,000, and standard deviation 
9 = 100. According to the normal approximation to the binomial 
probability law, X approximately obeys a normal probability law with 
parameters т = 20,000 and с = 100 [in making this statement we are 
ignoring the terms in (2.1) involving 3, which are known as a "continuity" 
Correction]. Since 20,400 is four standard deviations more then the mean 
of the probability law of X, the probability is approximately 0 that one 
would observe a value of X more than 20,400. Similarly, the probability 
is 1 that one would observe a value of X between 19,600 and 20,400. <q 


In order to have a convenient language in which to discuss the proof of 
(2.1), let us suppose that we are observing the number X of successes in n 
independent repeated Bernoulli trials with probability p of success at each 
trial. Next, to each outcome X let us compute the quantity 


X-—np 
(2.3) h = — ==. 
Vnpq 


which represents the deviation of X from np divided by V npq. Recall that 
the quantities лр and Vnpq are equal, respectively, to the mean and 
Standard deviation of the binomial probability law. The deviation h, 
defined by (2.3), is a random quantity obeying a discrete probability law 
Specified by a probability mass function p*(/), which may be given in terms 
of the probability mass function p(x) by 

(2.4) p*(h) = p V/npq + np). 

s the fact that for any given real number / the 


І sse 2 
лш з tion (of the number of successes from np, divided 


Probability that the devia. ss 
by Vnpq) "m be equal to his the same as the probability that the number 


; hv npg + np. 
of orbes pee ач probability mass function рб) Ew 
the original probability mass function pias к ere M 
Braphs, which are given in Figs. 2A and 2B forn =F д i B Ыы 
һе graph of the function p(x) becomes а b E phe | с! 
More and more widely along the x-axis as the number л of trials А 


242 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 


The graphs of the functions p*(h), on the other hand, are very similar to 
each other, for different values of л, and seem to approach a limit as т 
becomes infinite. It might be thought that it is possible to find a smooth 
curve that would be a good approximation to p*(h), at least when the 
number of trials л is large. We now show that this is indeed possible. 
More precisely, we show that if А is a real number for which p*(h) > 0 
then, approximately, 


1 1 


(2.5) m x еМ“, 
# V/npq VIr 
in the sense that 
= 
(2.6) lim ORARIN npg + пр) _ | 
n= 0 qa 


To prove (2.6), we first obtain the approximate expression for p(x); for 
2 = 0,1,...,п 


1 п рү” mg үте 
2.7 = Е 1 
(2.7) p) Vn {= — 2) (2) ( ) e", 


n — a, 


А А am 1 L^ в 
n which |R zl = + — i i i ate 
in which |R| < ola RA RS z): Equation (2.7) is an immedi 


consequence of the approximate expression for the binomial coefficient 


(i) ; for any integers n and k =0,1,... ‚п 


п n! 1 n AET p ns 
29 ( ) = edis d (" ( ) n 
(2.8) k Кп = K)! | A24 N k(n — К) 2) (п — k n 


Equation (2.8), in turn, is an immediate consequence of the approximate 
expression for т! given by Stirling's formula; for any integer n = 1,2,..- 


(2.9) т! = Vamm” eer), 0 — қт) < і. З 
12т 


In (2.7) let x = пр + hV/npq. Then n — x = nq — h'Y/npq, and 


a(n — x 


Q.10) — = (р + hV palng — hv рап) = npg. 


SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 243 


Then, using the expansion log, (1 + z) = x — 3a? + 623 for some 0 such 
that |6| < 1, valid for |а| < 4, we obtain that 


(0.11) —log, (V2s(npg)p*()) 


MN EIN 


= (np + hV/npq) log (1 + hV/g[np) 
+ (ng — hV npg) log (1 — hV plng) 


— д2 1 
= / Д р) — А i | 
(пр + hv np) i^ (q/np) Te + terms in P 
д2 
+ (пд — hv лра -v (р[пд) — сара + terms in xl 
2nq ne 
n к p.a 
= (пули = 17 + gh? + terms in EN 


12 . 1 
+ = P + pl? + terms in E 


1 ы 
-+ h? + terms іп =. 
2 ne 


If we ignore all terms that tend to 0 as л tends to infinity in (2.11), we 
obtain the desired conclusion, namely (2.6). 

_ From (2.6) one may obtain a proof of (2.1). However, in this book we 
8ive only a heuristic geometric proof that (2.6) implies (2.1). For an 
elementary rigorous proof of (2.1) the reader should consult J. Neyman, 
First Course in Probability and Statistics, New York, Henry Holt, 1950, 
Pp. 234-242. In Chapter 10 we give a rigorous proof of (2.1) by using the 


method of characteristic functions. ; f 
_ A geometric derivation of (2.1) from (2.6) is as follows. First plot p*(h) 
їп Fig. 2B as a function of h; note that p*(/t) = 0 for all points /, except 
those that may be represented in the form 
(2.12) = (= = npy V npq 

.,n. Next, as in Fig. 2C, plot p*(/) by a 


for some integer ж = 0, 1,.. Я 
—4Y. centered at all points / of the 


Series of rectangles of height (1 V/2:)e 
form of (2.12). 


244 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 


From Fig. 2C we obtain (2.1). It is clear that 


b 
Хр = Zoran UO 


over h of the form of 
(2.12) fork =a, +-+, b 


which is equal to the sum of the areas of the rectangles in Fig. 2C centered 
at the points A of the form of (2.12), corresponding to the integers k from 
a to b, inclusive. Now, the sum of the area of these rectangles is an 
approximating sum to the integral of the function (1/V/2z)e-!** between 


the limits (a — np — 3)/V npq and (b — пр + 3)/V/npq. We have thus 
obtained the approximate formula (2.1). 


Fig. 2C. The normal approximation to the binomial probability law. The continuous 

curve represents the normal density function. The area of each rectangle represents 

the approximate value given by (2.5) for the value of the probability mass function 
p*(h) at the mid-point of the base of the rectangle. 


It should be noted that we have available two approximations to the 


probability mass function p(x) of the binomial probability law. From 
(2.5) and (2.6) it follows that 


(2.13) Б 52207. 


") жаса ц, ОЙ ( 
‚|Р = —— ex 
( Ы У 2anpq p 2  npq 
whereas from (2.1) one obtains, setting a — b — z, 
zr—npdVj 

Упрд 


1 
(2.14) (ни — | е—}й dy, 


z—np—Vs 
Vnpg 


SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 245 


; In using any approximation formula, such as that given by (2.1), it is 
important to have available “remainder terms" for the determination of 
the accuracy of the approximation formula. Analytic expressions for 
the remainder terms involved in the use of (2.1) are to be found in 
J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, 
New York, 1937, p. 129, and W. Feller, “Оп the normal approximation to 
the binomial distribution,” Annals of Mathematical Statistics, Vol. 16, 
(1945), pp. 319-329. However, these expressions do not lead to con- 
clusions that are easy to state. A booklet entitled Binomial, Normal, and 
Poisson Probabilities, by Ed. Sinclair Smith (published by the author in 
1953 at Bel Air, Maryland), gives extensive advice on how to compute 
expeditiously binomial probabilities with 3-decimal accuracy. Smith (p. 38) 
States that (2.1) gives 2-decimal accuracy or better if np > 37. The 
accuracy of the approximation is much better for p close to 0.5, in which 
Case 2-decimal accuracy is obtained with п as small as 3. 

In treating problems in this book, the student will not be seriously wrong 
if he uses the normal approximation to the binomial probability law in 


cases in which np(1 — p) = 3- 
Extensive tables of the binomia 


ES [JAY gut 
Q.15) Fi; mp) = È () qa 


=0 


1 distribution function 
mss 0, 1,5557 


have become available in recent years. The Tables of the Binomial 
Probability Distribution, National Bureau of Standards, Applied Mathe- 
matics Series 6, Washington, 1950, give 1 — Fy(r; n, p) to seven decimal 
places for p — 0.01 (0.01) 0.50 and n = 2(1) 49. : These tables are extended 
in H. G. Romig, 50-100 Binomial Tables, Wiley, New York, 1953, in 
Which (а; n, p) is tabulated for n = 50(5) 100 and p — 0.01 (0.01) 0.50. 
A more extensive tabulation of Fg(sim p) for n= 1(1) 50(2) 100(10) 
200(20) 500(50) 1000 and p = 0.01 (0.01) 50 and also p equal to certain 
other fractional values is available in Tables of the Cumulative Binomial 


Probability Distribution, Harvard University press 1955. X 
The Poisson Approximation to the Binomial Probability Law. The 


Poisson approximation, whose proof and usefulness was indicated in 


Section 3 of Chapter 3, states that 


(2.16) k p 
m (np)! 
M ылкы S eeN 


NC es W. i i ility law 
Th j imation applies when the binomial probabi 
€ Poisson approximation pP ipie HURLED mes or 


is Very far from being bell shaped; t 


246 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 


It may happen that p is very small, so that the Poisson approximation 
may be used; but is so large that (2.14) holds, and the normal approxima- 
tion may be used. This implies that for large values of 2 = np the Poisson 
law and the normal law approximate each other. In theoretical exercise 2.1 
it is shown directly that the Poisson probability law with parameter À may 
be approximated by the normal probability law for large values of 2. 


p> Example 2С. A telephone trunking problem. Suppose you are designing 
the physical premises of a newly organized research laboratory. Since 
there will be a large number of private offices in the laboratory, there will 
also be a large number n of individual telephones, each connecting to a 
central laboratory telephone switchboard. The question arises: how 
many outside lines will the switchboard require to establish a fairly high 
probability, say 95%, that any person who desires the use of an outside 
telephone line (whether on the outside of the laboratory calling in or on 
the inside of the laboratory calling out) will find one immediately available ? 

Solution: We begin by regarding the problem as one involving indepen- 
dent Bernoulli trials. We suppose that for each telephone in the laboratory, 
say the jth telephone, there is a probability p; that an outside line will be 
required (either as the result of an incoming call or an outgoing call). One 
could estimate p; by observing in the course of an hour how many minutes 
h; an outside line is engaged, and estimating р, by the ratio h,/60. In order 
to have repeated Bernoulli trials, we assume Рі = ро =... = р, = р: 
We next assume independence of the п events Ay Ag, ..., Ay, in which A; 
is the event that the jth telephone demands an outside line at the moment 
of time at which we are regarding the laboratory. The probability that 
exactly k outside lines will be in demand at a given moment is, by the 


binomial law, given by (2) рч". 


Consequently, if we let К denote the number of outside lines connected 
to the laboratory switchboard and make the assumption that they are all 
free at the moment at which we are regarding the laboratory, then the 
probability that a person desiring an outside line will find one available 
is the same as the probability that the number of outside lines demanded 
at that moment is less than or equal to K, which is equal to 

K K k 
(2.18) 2 (7) p — py- <= ery (apy 
ok 


0 к=о К! 


: g = np + 3) ( —np — i ) 
= ф ас шы) M 
Мпра — p) Мпр(1 — p) 


where the first equality sign in (2.18) holds if the Poisson approximation 


SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 247 


to the binomial applies and the second equality sign holds if the normal 
approximation to the binomial applies. 
Define, for any 2 > 0 and integer n = 0,1,..., 


n 25 


(2.19) Fy(n; 2) s od ү Е 


Which is essentially the distribution function of the Poisson probability 
law with parameter 2. Next, define for P, such that 0 < P < 1, the symbol 
H(P) to denote the P-percentile of the normal distribution function, 


defined by 
(2.20) Ф(и(Р)) = 


(Р) 
A(x) de = P. 


ive the following expressions for the minimum number K of 
outside lines that should be connected to the laboratory switchboard in order 
to have a probability greater than a preassigned level Py that all demands for 


outside lines can be handled. Depending on whether the Poisson or the 
smallest integer such that 


One may g 


normal approximation applies, K is the 


(2.21) Fy Ki np) = Po 
(2.22) К> we Po) V np — p) t np — 1. 


In writing (2.22), we are approximating Ф(—пр — 3)] V npq] by 0, since 


—UP упру224 Иїлрд>16 


V/npq 
The value of (Р) сап be determined from Table I (see p. 441). In 
particular, 
Q.23) (0.95) = 1.645, (099) = 2.326. 


2.21) can be read from tables prepared by 
entitled Poissons Exponential Binomial 
1942) which tabulate, to six decimal 


The solution K of the inequality ( 
E. C. Molina (published in a book 
Limit, Van Nostrand, New York, 


Places, the function " 
Jc 
(2.24) iaie 2" p 


for about 300 values of 2 in the interval 0.001 < 2 < 100. 
The value of K, determined by (2.21) and (2.22), is given in Table 9 
for p = a s, 4, п = 90, 900, and py = 0.95, 0.99. 


30% 1 


248 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 
TABLE 2A 


THE VALUES OF K DETERMINED Bv (2.21) AND (2.22) 


p 39 15 i 
Approximation Poisson | Normal | Poisson | Normal | Poisson | Normal 

n = 90 6 53 14 13.2 39 36.9 
Po = 0.95 

n = 900 39 38.4 106 104.3 322.8 

п = 90 8 6.5 17 15.1 43 39.9 
Po = 0.99 

п = 900 43 42.0 113 110.4 3324 


THEORETICAL EXERCISES 


2.1. Normal approximation to the Poisson probability law. Consider a random 
phenomenon obeying a Poisson probability law with parameter 4. To 
an observed outcome XY of the random phenomenon, compute [es 
(X — 2)| V2, which represents the deviation of X from 2, divided by V4. 
The quantity // is a random quantity obeying a discrete probability law 
specified by a probability mass function p*(h), which may be given in 
terms of the probability function P(x) by p*(h) = ph VÀ + 2). In the 
same way that (2.6), (2.1), and.(2.13) are proved show that for fixed values 
of a, b, and k, the following differences tend to 0 as 2 tends to infinity: 


Vip*(h) — е-и, 0 
F V 2r 
24-1 
VA 
b Ak 1 
(2.25) Xe—-—— ей dy 0 
k=a k! Ут v 
а—2—15 
v2 


2.2. А competition problem. Suppose that т restaurants compete for the same 
п patrons. Show that the number of seats that each restaurant should have 
to order to have a probability greater than Py that it can serve all patrons 


SEC. 2 THE APPROXIMATION OF THE BINOMIAL PROBABILITY LAW 249 


2.1. 


22. 


23. 


2.4. 


2.5, 


2.6. 


23. 


who come to it (assuming that all the patrons arrive at the same time and 
choose, independently of one another, each restaurant with probability 
P = Mm) is given by (2.22), with p = Џт. Compute K for т = 2, 3, 4 
and P, = 0.75 and 0.95. Express in words how the size of a restaurant 
(represented by K) depends on the size of its market (represented by л), 


the number of its competitors (represented by m), and the share of the 
market it desires (represented by Po). 


EXERCISES 


of a coin 5075 heads were observed. Find 
of observing (i) exactly 5075 heads, (ii) 5075 
b) has probability 0.51 of falling heads. 


In 10,000 independent tosses 
approximately the probability 
or more heads if the coin (a) is fair, ( 
Consider a room in which 730 persons are assembled. For i = 1, 2,.. 
730, let A; be the event that the ith person was born on January 1. Assume 
that the events Ау,..., 4730 are independent and that each event has 
probability equal to 1/365. Find approximately the probability that 
(i) exactly 2, (ii) 2 or more persons were born on January 1. Compare the 
answers obtained by using the normal and Poisson approximations to the 


binomial law. 
Plot the probability mass function of the binomial probability law with 
parameters т = 10 and p = } against its normal approximation. In 
your opinion, is the approximation close enough for practical purposes? 


Consider an urn that contains 10 balls, numbered 0 to 9, each of which is 
equally likely to be drawn; thus choosing a ball from the urn is equivalent 
to choosing a number 0 to 9, and one sometimes describes this experiment 
by saying that a random digit has been chosen. Now let n balls be chosen 
with replacement. Find. the probability that among the numbers thus 
chosen the number 7 will appear between (n — 3 Vn)/10 times and 
(n + 3 n)I10 times, inclusive, if @) n = 10, Gi) = 100, I) n = 10,000. 
Compute the answers exactly or by means of the normal and Poisson 
approximations to the binomial probability law. 

Find the probability that in 3600 independent repeated trials of an 
experiment, in which the probability of success ofeach trial isp, the number 
of successes is between 3600p — 20 and 3600p + 20, inclusive, if (i) p = i 


Gi) p — }. 

А certain corporati 
bility is yg that an 
beginning of the business day. 


that a secretary will be available, 
to constitute s ool of secretaries for the group of 90 executives ? 


i) 2, (ii) 3 restaurants compete for the same 800 patrons. 
Fill the pu i Ak that each restaurant should have in order к 
have a probability greater than 95% that it ps serve all pid 
come to it (assuming that all patrons ar? M É pm et m : 
independently of one another, each restaurant with equal probability’. 


DEI 


utives. Assume that the proba- 


executive will require the services of a secretary at the 
If the probability is to be 0.95 or greater 
how many secretaries should be hired 


on has 90 junior exec 


2.9. 


2.10. 


NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 


Ata certain men's college the probability that a student selected at random 
on a given day will require a hospital bed is 1/5000. If there are 8000 
students, how many beds should the hospital have so that the probability 
that a student will be turned away for lack of a bed is less than 1 °% (in 
other words, find K so that PLY > K] < 0.01, where X is the number of 
students requiring beds). 


Consider an experiment in which the probability of success at each trial 
is p. Let X denote the successes in л independent trials of the experiment. 
Show that 


PIX — np| = (1.96) Улрд] = 95%. 


Consequently, if p — 0.5, with probability approximately equal to 0.95, 
the observed number X of successes in л independent trials will satisfy 
the inequalities 


(2.26) (0.5)n — (0.98) Vn < X < 0,5я + (0.98) Vn. 


Determine how large л should be, under the assumption that (i) p — 0.4, 
(ii) p = 0.6, (iii) p = 0.7, to have a probability of 5% that the observed 
number Х of successes in the л trials will satisfy (2.26). 


In his book Natural Inheritance, P- 63, F. Galton in 1889 described an 
apparatus known today as Galton's quincunx. The apparatus consists 
of a board in which nails are arranged in rows, the nails of a given row 
being placed below the mid-points of the intervals between the nails in the 
row above. Small steel balls of equal diameter are poured into the 
apparatus through a funnel located opposite the central pin of the first 
row. As they run down the board, the balls are "influenced" by the nails 
in such a manner that, after passing through the last row, they take up 
positions deviating from the point vertically below the central pin of the 
first row. Let us call this point œ = 0. Assume that the distance between 
2 neighboring pins is taken to be l and that the diameter of the balls is 
slightly smaller than 1. Assume that in passing from 1 row to the next the 
abscissa (e-coordinate) of a ball changes by either ог —}, each possibility 
having equal probability. To each opening in a row of nails, assign as its 
abscissa the mid-point of the interval between the 2 nails. If there is an 
even number of rows of nails, then the Openings in the last row will have 
abscissas 0, +1, +2,.... Assuming that there are 36 rows of nails, find 
fork = 0, +1, 42,..., 410 the Probability that a ball inserted in the 
funnel will pass through the opening in the last row, which has abscissa К. 


- Consider а liquid of volume V, which contains N bacteria. Let the liquid 


be vigorously shaken and part of it transferred to a test tube of volume 0. 
Suppose that (i) the probability p that any given bacterium will be trans- 
ferred to the test tube is equal to the ratio of the volumes v/V and that (ii) 
the appearance of 1 particular bacterium in the test tube is independent of 
the appearance of the other N — | bacteria, Consequently, the number 
of bacteria in the test tube is a numerical valued random phenomenon 
obeying a binomial probability law with parameters N and p = 017. 
Let m = N/V denote the average number of bacteria per unit volume. Let 
the volume v of the test tube be equal to 3 cubic centimeters. 


SEC. 3 THE POISSON PROBABILITY LAW 251 


(i) Assume that the volume v of the test tube is very small compared to the 
volume V of liquid, so that p = v/V is a small number. In particular, 
assume that p — 0.001 and that the bacterial density m = 2 bacteria per 
cubic centimeter. Find approximately the probability that the number 
of bacteria in the test tube will be greater than 1. 

(ii) Assume that the volume v of the test tube is comparable to the volume 
V of the liquid. In particular, assume that V = 12 cubic centimeters and 
N — 10,000. What is the probability that the number of bacteria in the 
test tube will be between 2400 and 2600, inclusive? 


2.12. Suppose that among 10,000 students at a certain college 100 are red- 
haired. 
(i) What is the probability that a sample of 100 students, selected with 
replacement, will contain at least one red-haired student? 
(ii) How large is a random sample, drawn with replacement, if the proba- 
bility of its containing a red-haired student is 0.95? Е 
It would be more realistic to assume that the sample is drawn without 
replacement. Would the answers to (i) and (ii) change if this assumption 
were made? Hint: State conditions under which the hypergeometric 


law is approximated by the Poisson law. 


2.13. Let S be the observed number of successes in n independent repeated 
Bernoulli trials with probability p of success at each trial. For each of 
the following events, find (i) its exact probability calculated by use of the 
binomial probability law. (ii) its approximate probability calculated by 
use of the normal approximation, (iii) the percentage error involved in 


using (ii) rather than (1). 


n p the event that n p Һе event that 


i 4 03 S52 (viii) 49 0.2 S «4 
(i) 9 07 S76 (ix) 49 02 $28 
(ii) 9 07 <5 =8 (x) 49 0.2 $ <16 
(iv) 16 0.4 2<s<10 (xi) 100 0.5 S <10 
(v) 16 0.2 S<2 (xii) 100 0.5 S > 40 
(vi) 25 0.9 $ < 20 (xiii) 100 0.5 5 = 50 
(уй) 25 0.3 Sas < 10 (хіу) 100 0.5 S = 60 


3. THE POISSON PROBABILITY LAW 


The Poisson probability law has become increasingly important in recent 
years as more and more random phenomena to which the law applies have 


been studied. In physics the random emission of electrons from the fila- 
ment of a vacuum tube, or from à photosensitive substance under the 
influence of light, and the spontaneous decomposition of radioactive 
atomic nuclei lead to phenomena obeying а Poisson probability law. This 
law arises frequently in the fields of operations research and management 


252 'NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH. 6 


science, since demands for service, whether upon the cashiers or salesmen 
of a department store, the stock clerk of a factory, the runways of an 
airport, the cargo-handling facilities of a port, the maintenance man of a 
machine shop, and the trunk lines of a telephone exchange, and also the 
rate at which service is rendered, often lead to random phenomena either 
exactly or approximately obeying a Poisson probability law. Such random 
phenomena also arise in connection with the occurrence of accidents, 
errors, breakdowns, and other similar calamities. 

The kinds of random phenomena that lead to a Poisson probability 
law can best be understood by considering the kinds of phenomena that 
lead to a binomial probability lav. The usual situation to which the 
binomial probability law applies is one in which n independent occurrences 
of some experiment are observed. One may then determine (i) the number 
of trials on which a certain event occurred and (ii) the number of trials on 
which the event did not occur. There are random events, however, that 
do not occur as the outcomes of definite trials of an experiment but rather 
at random points in time or space. For such events one may count the 
number of occurrences of the event ina period of time (or space). However, 
it makes no sense to speak of the number of nonoccurrences of such an 


event in a period of time (or space). For example, suppose one observes 
the number of airplanes arrivi 


report how many airplanes а 
sense to inquire how many ai 
if one is observing the numb 


of length Л, 


(i) the probability that exactly one event will occur in the interval is 
approximately equal to ил 


1, in the sense that it j nd 
rı(h)/h tends to 0 as А tends to 0; а 


(ii) the оу that exactly zero events Occur in the interval is 
approximately equal to ] — ИЙ, in the sense that iti 1 

E tol — ић + 

ri), and r,(h)/h tends to 0 as А tends to 0; and аы i 

(iii) the probability that tw. : 


to a quantity r4(A) such that the quotient r.(/ h 
of the interval tends to 0. q ra(A)/h tends to 0 as the length ^ 


SEC. 3 THE POISSON PROBABILITY LAW 253 


the mean rate at which events occur 


The parameter и may be interpreted as 
we refer to и as the mean rate of 


per unit time (or space); consequently, 
occurrence (of events). 


> Example ЗА. Suppose one is obser 
arrive at a toll collector's booth on а toll bridge. Let us suppose that we , 


are informed that the mean rate и of arrival of automobiles is given by 
и = 1.5 automobiles per minute. The foregoing assumption then states 
that in a time period of length Л = 1 second = (55) minute, exactly one 
car will arrive with approximate probability uh = (1.5) (66) = ay, whereas 
exactly zero cars will arrive with approximate probability 1 — wh =. 4 


ving the times at which automobiles 


In addition to the assumption concerning the existence of the parameter 
H with the properties stated, we also make the assumption that if an interval 
of time is divided into n subintervals and, for i = 1,...,n, A; denotes the 
event that at least one event of the kind we are observing occurs in the 
ith subinterval then, for any integer 7, Ay «++ An аге independent events. 

We now show, under these assumptions, that the number of occurrences 
of the event in a period of time (or space) of length (or area or volume) t 
obeys a Poisson probability law with parameter ш; тоге equ A 
Probability that exactly k events occur in a time period of length t is equat to 

t k 

(3.1) SU ES | 
quence of events occurring in 
oing assumptions, by saying 
v at the rate of ш events per 


Consequently, we may describe briefly а se 
time (or space), and which satisfy the foreg 
that the events obey a Poisson probability lay 
unit time (or unit space). 

Note that if X is the number of event 
length z, then X obeys a Poisson proba 
Sequently, jz is the mean rate of occurrence ofe 
Sense that the number of events occurring 11 a 
Obeys a Poisson probability law with mean A 

To prove (3.1), we divide the time peng MEN E ts will occur in the 
of length л = z/n. Then the probability ns ily one event has 
time z is approximately equal to the probability ш in bh the original 
Occurred in exactly k of the п subintervals of time 1 “his is equal to the 
Interval was divided. By the foregoing 2895 independent repeated 
Probability of scoring exactly k successes in n 3 1 ina Ri. 
*rnoulli trials in which the probability of succes 


hu = (ut)/n; this is equal to 


f oere- 


ring in a time interval of 
bility law with mean gt. Con- 
vents per unit time, in the 
time interval of length 1 


s occur 


gth t into” time periods 


254 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 


Now (3.2) is only an approximation to the probability that К events will 
occur in time ¢. To get an exact evaluation, we must let the number of 
subintervals increase to infinity. Then (3.2) tends to (3.1) since rewriting 
(3.2) 
Qt (n). 1 
gon (i - 8) e L usus 
as n—- co. 

It should be noted that the foregoing derivation of (3.1) is not completely 
rigorous. To give a rigorous proof of (3.1), one must treat the random 
phenomenon under consideration as a Stochastic process. A sketch of 
such proof, using differential equations, is given in section 5. 


> Example 3B. It is known that bacteria of a certain kind occur in water 
at the rate of two bacteria per cubic centimeter of water. Assuming that 
this phenomenon obeys a Poisson probability law, what is the probability 


thata sample of two cubic centimeters of water will contain (i) no bacteria, 
(ii) at least two bacteria? 


Solution; 
bacteria in a two-cubic-ce 


> Example ЗС. Misprints. In a certain published book of 520 pages 
390 typographical errors occur, What is the Probability that four pages, 


selected randomly by the Printer as examples of his work, will be free from 
errors? 


Solution: 
However, let us 


probability law at the Tate of 390/520 = 3 errors per page. The number of 


errors in four pages then obeys a Poisson probability law with parameter 


(0423; consequently, the probability is e-? that there will be no errors 
in the four pages. < 


€ between the cathode and the anode is so great 


SEC. 3 THE POISSON PROBABILITY LAW 255 


that all electrons emitted by the cathode have such high velocities that 
there is no accumulation of electrons between the cathode and the anode 
(and thus no space charge). If we consider an emission of an electron from 
the cathode as an event, then the assumptions preceding (3.1) may be 
Shown as satisfied (see W. B. Davenport, Jr. and W. L. Root, Ап Introduc- 
tion to the Theory of Random Signals and Noise, McGraw-Hill, New York, 
1958, pp. 112-119). Consequently, the number of electrons emitted from 
the cathode in a time interval of length 7 obeys a Poisson probability law 
with parameter At, in which 4 is the mean rate of emission of electrons 


from the cathode. 4 


The Poisson probability law was first published in 1837 by Poisson in his 
book Recherches sur la probabilité des jugements en matière criminelle et en 
matière civile. In 1898, in a work entitled Das Gesetz der kleinen Zahlen, 
Bortkewitz described various applications of the Poisson distribution. 
However until 1907 the Poisson distribution was regarded as more of a 
curiosity than a useful scientific tool, since the applications made of it 
Were to such phenomena as the suicides of women and children and deaths 
from the kick of a horse in the Prussian army. Because of its derivation as 
a limit of the binomial law, the Poisson law was usually described as the 
probability law of the number of successes in a very large number of 
independent repeated trials, each with a very small probability of success. 

In 1907 the celebrated statistician W. S. Gosset (writing, as was his 
wont, under the pseudonym *Student") deduced the Poisson law as the 
probability law of the number of minute corpuscles to be found in sample 
drops of a liquid, under the assumption that the corpuscles are distributed 


at random throughout the liquid; see "Student," "On the error of counting 


with a Haemocytometer, " Biometrika, Vol. 5, p. 351. In 1910 the Poisson 


law was shown to fit the number of *z-particles discharged per j-minute 
Ог 1-minute interval from a film of polonium Hs see Rutherford and Geiger, 
“The probability variations in the distribution of z-particles," Philosophical 


Magazine, Vol. 20, p. 700. 


Although one is able Ір К 
phenomenon will obey a Poisson probability law with some parameter A, 


the value of the constant 4 cannot be deduced theoretically but must be 
determined empirically. The determination of Ais a statistical problem. 
The following procedure for the determination of 4 can be justified on 
Various grounds. Given events occurring in time, choose an interval of 
length г, Observe a large number N of time intervals of length t For each 
integer к =0,1,2,... let №, be the number of intervals in which exactly 


to state assumptions under which a random 


К events have occurred. Let 
(3  T-0-NMFI-N 2M RUBENS 


256 NORMAL, POISSON, AND RELATED PROBABILITY LAWS сн. 6 


be the total number of events observed in the N intervals of length ¢. Then 
the ratio T/N represents the observed average number of events happening 
per time interval of length t. Аз ап estimate / of the value of the parameter 
2, we take 


(3.4) A= 


If we believe that the random 
Poisson probability law with 
probability p(k; 2) that in a tim 
occur. 


phenomenon under observation obeys a 
parameter /, then we may compute the 
e interval of length ¢ exactly К successes will 


> Example ЗЕ. Vacancies in the United States Supreme Court. W. A. 
Wallis, writing on *The Poisson Distribution and the Supreme Court,” 
Journal of the American Statistical Association, Vol. 31 ( 1936), pp. 376-380, 


Teports that vacancies in the United States Supreme Court, either by death 


or resignation of members, occurred as follows during the 96 years, 1837 
to 1932: 


k — number of vacancies 


N, = number of years 
during the year 


with k vacancies 


0 59 
1 27 
2 9 
3 1 
over 3 0 


Since T= 2742-941 


= 96, it follows from (3.4) that 
=0.5. If it is believed that vacancies j 


n the Supreme Court occur in 
a mean rate of 0.5 a year, then 
7° that during his four-year term 


of office the next 9 appointments to the Supreme 


Court. 


The foregoing data also provide a method of testing the hypothesis that 
Vacancies in the Supreme Court obey a Poisson probability at the rate of 


0.5 vacancies per Year. If this is the case, then the probability that in a 
year there will be k vacancies is given by 


0.5)* 
p(k; 0.5) = enu» k —0,1,2,-.., 
The expected number of years in N years in which k vacancies occur, which 
is equal to Np(k; 0.5), may be computed and compared with the observed 
number of years in which К vacancies have Occurred; refer to Table 3A. 


= 


SEC. 3 THE POISSON PROBABILITY LAW 257 


TABLE 3A 


Number of Years out of 96 
in which & Vacancies Occur 


Number of Probability p(k;0.5) Expected Number Observed Number 


Vacancies k of k Vacancies (96)p(k; 0.5) Ny 
0 0.6065 58.224 59 
1 0.3033 29.117 27 
2 0.0758 7.277 9 
3 0.0126 1.210 1 
Оуег 3 0.0018 0.173 0 


The observed and expected numbers may then be compared by various 
Statistical criteria (such as the 7?-test for goodness of fit) to determine 


Whether the observations are compatible with the hypothesis that the 
number of vacancies obeys a Poisson probability law at a mean rate of 0.5. 


4 


The Poisson, and related, probability laws arise in a variety of ways in 
the mathematical theory of queues (waiting lines) and the mathematical 
theory of inventory and production control. We give a very simple 
example of an inventory problem. It should be noted that to make the 
following example more realistic one must take into account the costs of 
the various actions available. 
» Example 3F. An inventory problem. Suppose a retailer discovers that 
the number of items of a certain kind demanded by customers in a given 
time period obeys a Poisson probability law with known parameter 2. 

hat stock K of this item should the retailer have on hand at the beginning 
of the time period in order to have a probability 0.99 that he will be able 
to supply immediately all customers who demand the item du 


Period under consideration ? е Ач 
Solution: The problem is to find the number K, such that the probability 


is 0.99 that there will be K or less occurrences during the time period of the 
event when the item is demanded. Since the number of occurrences of this 
event obeys a Poisson probability law with parameter 2, we seek the 


integer K such that 


ring the time 


А E © A 
NP e-"* «5 0,01. 
(3.5) EO e > 0.99, NS П 


) сап be read from Molina's 


Th ; d inequality in (3.5 
Басра, Со[їшеъесоптшЕ omial Limit, Van Nostrand, 


tables (E. C. Molina, Poisson's Exponent ial Bin 


258 NORMAL, POISSON, AND RELATED PROBABILITY LAWS CH. 6 


New York, 1942). If 2 is so large that the normal approximation to the 
Poisson law may be used, then (3.5) may be solved explicitly for K. Since 
the first sum in (3.5) is approximately equal to 


x T 
on vus J А 


VÀ 
K should be chosen so that (К — 2 + 3)/ V2 = 2.326 or 
(3.6) К = 2.326У2 h 4 


THEORETICAL EXERCISES 


3.1. A problem of aerial Search. State conditions for the validity of the 
following assertion: if N ships are distributed at random over a region 
of the ocean of area A, and if a plane can search over О square miles 
of ocean per hour of flight, then the number of ships sighted by a plane 


in a flight of T hours obeys a Poisson probability law with parameter 
5 = NOTIA. 


3.2, The number of matches approximately obeys a Poisson probability law. 
Consider the number of matches obtained by distributing M balls, 
numbered | to M, among M urns in such a way that each urn contains 
exactly 1 ball. Show that the probability of exactly т matches tends to 
eX (1/m!), as M tends to infinity, so that for large M the number of matches 
approximately obeys a Poisson probability law with parameter 1. 


EXERCISES 


State carefully the probabilistic assum 


following problems. Keep in mind the empirically observed fact that the 


occurrence of accidents, errors, breakdowns, and so on, in many instances 
appear to obey Poisson probability laws, 


ptions under which you solve the 


3.1. The incidence of Polio during the years 1949-1954 was approximately 25 
per 100,000 population. In à city of 40,000 what is the probability of 
having 5 or fewer cases? In a City of 1,000,000 what is the probability of 
having 5 or fewer cases? State your assumptions, 

3.2. A manufacturer of wool blankets ins 
number of defects. (A defe 
records it is known th 
Calculate the probabili 


pects the blankets by counting the 
ct may be a tear, an oil Spot, etc.) From past 
at the mean number of defects per blanket is 5. 
ty that a blanket will contain 2. or more defects. 


3.3. Bank tellers in a certain bank make errors in entering figures in their 
ledgers at the rate of 0.75 error Рег page of entries, What is the probability 
that in 4 pages there will be 2 or more errors? 


SEC. 
3.4. 


3.5. 


3.6. 


3.7. 


3.8. 


3.9. 


3.10. 


3.11. 


3 THE POISSON PROBABILITY LAW 259 


Workers in a certain factory incur accidents at the rate of 2 accidents per 
week. Calculate the probability that there will be 2 or fewer accidents 
during (i) 1 week, (ii) 2 weeks; (iii) calculate the probability that there 
will be 2 or fewer accidents in each of 2 weeks. 


A radioactive source is observed during 4 time intervals of 6 seconds each. 
The nümber of particles emitted during each period are counted. If the 
particles emitted obey a Poisson probability law, at a rate of 0.5 particles 
emitted per second, find the probability that (i) in each of the 4 time 
intervals 3 or more particles will be emitted, (ii) in at least 1 of the 4 
time intervals 3 or more particles will be emitted. 


Suppose that the suicide rate in a certain state is 1 suicide per 250,000 
inhabitants per week. 

(i) Find the probability that in a certain town of population 500,000 there 
will be 6 or more suicides in a week. 

(ii) What is the expected number of weeks in a year in which 6 or more 
suicides will be reported in this town. 

(iii) Would you find it surprising that during 1 year there were at least 2 
weeks in which 6 or more suicides were reported ? 


Suppose that customers enter a certain shop at the rate of 30 persons an 
hour. . d | 

(i) What is the probability that during a 2-minute interval either no one 
will enter the shop or at least 2 persons will enter the shop. 

(ii) If you observed the number of persons entering the shop during each 
of 30 2-minute intervals, would you find it surprising that 20 or more of 
these intervals had the property that either no one or at least 2 persons 
entered the shop during that time? 


Suppose that the telephone calls coming into a certain switchboard obey 
a Poisson probability law at a rate of 16 calls per minute. If the switch- 
board can handle at most 24 calls per minute, what is the probability, 
using a normal approximation, that in 1 minute the switchboard will 
receive more calls than it can handle (assume all lines are clear). 


In a la fleet of delivery trucks the average number inoperative on any 
day е of repairs ЗА Two standby trucks are available. What is 
the probability that on any day (i) no standby trucks will be needed, 
(ii) the number of standby trucks is inadequate. 
es occur among the buses of a large bus company at 
Assuming that each motor failure requires the services 
a whole day, how many mechanics should the bus 
i ility i hat а 
company employ to insure that the probability is at least 0.95 th 
ШО Sel be едак to repair each motor as it fails? (More precisely, 
find the smallest integer K such that the probability is greater than or 
equal to 0.95 that K or fewer motor failures will occur in a day.) 


Major motor Ѓайиг 
the rate of 2 a day. 
of 1 mechanic for 


staurant located in the business section of a city. How тапу 


Consider a re 
а t wishes to serve at least 95 % of all those 


seats should it have available if i 


260 


3.12. 


3.13. 


NORMAL, POISSON AND RELATED PROBABILITY LAWS CH. 6 


who desire its services in a given hour, assuming that potential customers 
(each of whom takes at least an hour to eat) arrive in accord with the 
following schemes: 

(i) 1000 persons pass by the restaurant in a given hour, each of whom has 
probability 1/100 of desiring to eat in the restaurant (that is, each person 
passing by the restaurant enters the restaurant once in every 100 times); 
(ii) persons, each of whom has probability 1/100 of desiring to eat in the 
restaurant, pass by the restaurant at the rate of 1000 an hour; 


(iii) persons, desiring to be patrons of the restaurant, arrive at the restaurant 
at the rate of 10 an hour. 


Flying-bomb hits on London. The following data (В. D. Clarke, “Ап 
application of the Poisson distribution," Journal of the Institute of Actuaries, 
Vol. 72 (1946), p. 48) give the number of flying-bomb hits recorded in 


each of 576 small areas of t = 1 square kilometers each in the south of 
London during World War II. 


k = number of flying- №. = number of areas 
bomb hits per area with k hits 


229 
211 
93 
35 


zi 


0 
1 
2 
3 
4 
5 or over 1 


Using the 


procedure in example 3E, show that these observations are 
well fitted by 


а Poisson probability law. 


For each of the followin: 


ea g numerical valued random phenomena state 
conditions under which i 


: t may be expected to obey, either exactly or 
approximately, a Poisson probability law: (i) the number of telephone 
calls received at a given switchboard per minute; (ii) the number of 
automobiles passing a given point on a highway per minute; (iii) the 
number of bacterial colonies in a given culture per 0.01 Square millimeter 
ona microscope Slide; (iv) the number of times one receives 4 aces per 
75 hands of bridge; (v) the number of defective screws per box of 100. 


4. THE EXPONENTIAL AND GAMMA PROBABILITY LAWS 


It has already been seen that the 
probability laws arise in response to th 
many trials need one wait in order to 
of independent repeated Bernoulli tri 
at each trial is p? In the same way. 


geometric and negative binomial 
е following question: through how 
achieve the rth success in a sequence 
als in which the probability of success 
‚ exponential and gamma probability 


SEC. 4 THE EXPONENTIAL AND GAMMA PROBABILITY LAWS 261 


laws arise in response to the question: how long a time need one wait if 
one is observing a sequence of events occurring in time in accordance with 
a Poisson probability law at the rate of и events per unit time in order to 
observe the rth occurrence of the event? 


p» Example 4A. How long will a toll collector at a toll station at which 
automobiles arrive at the mean rate и = 1.5 automobiles per minute have 
to wait before he collects the rth toll for any integer г = 1, 2,...? 4 


We now show that the waiting time to the rth event in a series of events 
h a Poisson probability law at the rate of u 
pace) obeys a gamma probability law with 
it has probability density function 


happening in accordance wit 
events per unit of time (or s 
parameter г and и; consequently, 


4 E ENT 
4.1) fo= gem et 120 
=0 t «O0. 


In particular, the waiting time to the first event obeys the exponential 
probability law with parameter 4 (or equivalently, the gamma probability 
law with parameters г = 1 and и) with probability density function 
(4.2) JO euet BZ 0 
=0 1< 0. 
find the distribution function of the time of occur- 
For t > 0, let F,(t) denote the probability that the 
e rth event will be less than or equal to f. Then 
1 — F(t) represents the probability that the time of occurrence of the rth 
t. Equivalently, 1 — F(t) is the probability that 


event will be greater than ў i 
the number of events occurring in the time from 0 to z is less than r; 


To prove (4.1), first 
rence of the rth event. 
time of occurrence of th: 


consequently, Е 
(4.3) re ЕД) 2200097. 


By differentiating (4.3) with respect to t, one obtains (4.1). 


> Example 4B. Consider a baby who cries at random times at a mean 
rate of six distinct times per hour. If his parents respond only to every 


second time, what is the probability that ten or more minutes will elapse 


between two responses of the parents to the baby? 
Solution: From the assumptions given (which may not be entirely 


realistic) the length T in hours of the time interval between two responses 
obeys a gamma probability law with parameters r= 2 and и = 6, 


Consequently, 
(4.4) pir > 2] al 6(6t)e-*! dt = 2e7, 


262 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 


in which the integral has been evaluated by using (4.3). If the parents 
responded only to every third cry of the baby, then 


1 26 22—61 23 Ji 
r[r>:] -f 3; (ne 4=56 1 


More generally, if the parents responded only to every rth cry of the baby, 
then 


1 е 6 їз ahs 
(4.5) P2; -[ 3 lest dr 
1 1 1 
=й snow 4 


The exponential and gamma probability laws are of great importance 
in applied probability theory, since recent studies have indicated that in 
addition to describing the lengths of waiting times they also describe such 
numerical valued random phenomena as the life of an electron tube, the 
time intervals between successive breakdowns of an electronic system, the 
time intervals between accidents, such as explosions in mines, and so on. 

The exponential probability law may be characterized in a manner that 
illuminates its applicability as a law of waiting times or as a law of time 
to failure. Let T be the observed waiting time (or time to failure). By 


definition, T obeys an exponential probability law with parameter 2 if and 
only if for every a > 0 


(4.6) P[T > a] = 1 — F(a) - [e dmg 


a 
It then follows that for any positive numbers a and b 
(4.7) P[T 7 a 4 b| T b] = её = P[T 7 a]. 

In words, (4.7) says that, given an item of equipment that has served b or 
more time units, its conditional probability of serving a + b or more time 
units is the same as its original probability, when first put into service of 
serving a or more time units. Another way of expressing (4.7) is to say that 
if the time to failure of a piece of equipment Obeys an exponential prob- 
ability law then the equipment is not subject to wear or to fatigue. 

The converse is also true, as we now show. If the time to failure of an 
item of equipment obeys (4.7), then it obeys an exponential probability law. 
More precisely, let F(x) be the distribution function of the time to failure and 
assume that F(x) = 0 for x < 0, F(x) < 1 for x > 0, and 


Bite 
(4.8) Euge uj 


T= Fy) = 1— F(x) 


for z, y > 0. 


SEC. 4 THE EXPONENTIAL AND GAMMA PROBABILITY LAWS 263 


Then necessarily, for some constant 220, 
(4.9) 1—F(x)=e** Гога > 0. 

If we define g(x) = log, [1 — F(2)], then the foregoing assertion follows 
from a more general theorem. 

THEOREM. Ifa function g(x) satisfies the functional equation 


(4.10) gle +) = #@) +50, «50 
and is bounded in the interval O to 1, 
(4.11) Ig) < M. 0<+5<1, 


for some constant M, then the function g(x) is given by 
(4.12) g(x) = g(Dv. => 0. 
Proof: Suppose that (4.12) were not true. Then the function G(x) = 


g(x) — g(1)z would not vanish identically in x. Let zy > 0 be a point 


such that G(x) 40. Now it is clear that G(x) satisfies the functional 


equation in (4.10). Therefore, G(2x,) = G(x) + С(хо), and, for any 
integer n, G(nzg) = nG(z). Consequently, іт |G(nzo)| = оо. We now 


show that this cannot be true, since the function G(x) satisfies the inequality 
|G(a)| < 2M for all x, in which M is the constant given in (4.11). To prove 
this, note that G(1) = 0. Since G(x) satisfies the functional equation in 
(4.10) it follows that, for any integer л, (п) = 0 and G(n + x) = GÈ) for 
0 < а < 1. Thus G(x) is a function that is periodic, with period 1. By 
(4.11), G(x) satisfies the inequality |G(x)| = 2M for 0<x<1. Being 
Periodic with period 1, it therefore satisfies this inequality for all x. The 


proof of the theorem is now complete. 


For references to the history of the foregoing theorem, and a generaliza- 
tion, the reader may consult G. S. Young, “The Linear Functional 
Equation,” American Mathematical Monthly, Vol. 65 (1958), pp. 37-38. 


EXERCISES 


4.1. Consider a radar set of a type whose failure law is exponential. If radar 
failure rate 2 = 1 set/1000 hours, find a length T of 


sets of this type have a 1 : | 
time such that the probability is 0.99 that a set will operate satisfactorily 


for a time greater than ilis 

a radio tube of a certain type obeys an exponential 
1000, (ii) 4 = 1/1000. A company producing 
them a certain lifetime. For how many 
d to function, to achieve a probability 
he number of hours guaranteed? 


4.2. The lifetime in hours of 
law with parameter (022 
these tubes wishes to guarantee 
hours should the tube be guarantee 
of 0.95 that it will function at least t 


264 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 


4.3. Describe the probability law of the following random phenomenon: the 
E number N of times a fair die is tossed until an even number appears (i) for 
the first time, (ii) for the second time, (iii) for the third time. 


4.4. A fair coin is tossed until heads appears for the first time. What is the 
probability that 3 tails will appear in the series of tosses? 


4.5. The customers of a certain newsboy arrive in accordance with a Poisson 
probability law at a rate of 1 customer per minute. What is the probability 


that 5 or more minutes have elapsed since (i) his last customer arrived, (ii) 
his next to last customer arrived ? 


4.6. Suppose that a certain digital computer, which operates 24 hours a day, 
suffers breakdowns at the rate of 0.25 per hour. We observe that the 


computer has performed satisfactorily for 2 hours. What is the probability 
that the machine will not fail within the next 2 hours? 


4.7. Assume that the probability of failure of a ball bearing at any revolution 
is constant and equal to p. What is the probability that the ball bearing 
will fail on or before the nth revolution? If p = 1074, how many revolutions 
will be reached before 10% of such ball bearings fail? More precisely, find 
K so that PLY > K] < 0.1, where X is the number of revolutions to failure. 


A lepidopterist wishes to estimate the frequency with which an unusual 
form of a certain species of butterfly occurs іп a particular district. He 
catches individual specimens of the species until he has obtained exactly 
5 butterflies of the form desired, Suppose that the total number of butter- 
flies caught is equal to 25. Find the probability that 25 butterflies would 
have to be caught in order to obtain 5 of a desired form, if the relative 
frequency p of occurrence of butterflies of the desired form is given by 
(i) p = $, Gi) p = 1. 
4.9. Consider a shop at which customers arrive at random at a rate of 30 per 
hour. What fraction of the time intervals between successive arrivals will 
E. (i) longer than 2 minutes, (ii) shorter than 4 minutes, (iii) between 1 and 
minutes, 


4.8. 


5. BIRTH AND DEATH PROCESSES 


In this section we indicate briefl 


ИЗ y how one may derive the Poisson 
probability law, and various related 


J ; probability laws, by means of differen- 
tial equations. The process to be examined is treated in the literature of 
stochastic processes under the name “birth and death.” 

Consider a population, such as the molecules present in a certain sub- 
volume of gas, the Particles emitted by a radioactive source, biological 
organisms of a certain kind present in a certain environment, persons 
waiting in a line (queue) for service, and so on. Let X, be the size of the 


population at a given time ¢. The probability law of X, is specified by its 
probability mass function, 


(5.1) Р =, =n n=0,1,2,-.. 


24> 


SEC. 5 BIRTH AND DEATH PROCESSES 265 


A differential equation for the probability mass function of X, may be 
found under assumptions similar in spirit to, but somewhat more general 
than, those made in deriving (3.1). In reading the following discussion 
the reader should attempt to formulate explicitly for himself the assump- 
tions that are being made. A rigorous treatment of this discussion is given 
by W. Feller, An Introduction to Probability Theory and its Applications, 
Wiley, 1957, pp. 397-411. 

Let го(л), r,(/1), and (л) be functions defined for h > 0 with the property 


that 


j E EU 
lim rofl) = lim n = lim "0 = 0. 
h-0 1 h-0 h Jð 1 


Assume that the probability is r,(/) that in the time from # to t + A the 
population size will change by two or more. For n > 1 the event that 
X,,;, = n (n members in the population at time ¢ + A) can then essentially 
happen in any one of three mutually exclusive ways: (i) the population 
size at time ¢ is n and undergoes no change in the time from ¢ to t + Л; 
(ii) the population size at time / is n — 1 and increases by one in the time 
from t to t + h; (iii) the population size at time f is n + 1 and decreases 
by one in the time from t to f +h. For п = 0, the event that X}, = 0 
can happen only in ways (i) and (iii). Now let us introduce quantities 
À, and u, defined as follows; 7,4 + 700) for any time г and positive 
value of / is the conditional probability that the population size will 
increase by one in the time from z to ¢ + A, given that the population had 
size n at time 1, whereas Hri + rolh) is the conditional probability that 
the population size will decrease by one in the time from г to £ + h, given 
that the population had size at time t. In symbols, 2, and и, are such 


that, for any time ¢ and small h > 0, 

Ae РХ — X, 10v = n], 
pfi PIS Mom cl = п], 
s such that the difference between the two 
o 0 faster than Л, as / tends to 0. In writing 
rms that tend to 0 faster than Л, as A tends 
h in deriving the differential equations in 
h to verify this statement for himself. 


п> 0 


(5.2) nisi: 
the approximation in (5.2) i 
sides of each equation tends t 
the next equations we omit te 
to 0, since these terms vanis 
(5.10) and (5.11). The reader may wis 

The event (i) then has probability, 


(5.3) p(n; t — Anh — Hh); 
the event (ii) has probability 
(5.4) po — 1; Anal 


the event (iii) has probability 
(5.5) pn + 1; най. 


266 NORMAL, POISSON, AND RELATED PROBABILITY LAWS cH. 6 
Consequently, one obtains for п > 1 
(5.6) p(n; t + h) = р(п; 0)(1 — А, — uh) 
+ р(п — 1; 02, 4h + plin + 1; ра. 
For n — 0 one obtains 
(5.7) pO; t + A) = pO ANO — Agh) + р(1; щл. 


It may be noted that if there is a maximum possible value N for the 


population size then (5.6) holds only for 1 < n < N — 1, whereas for 
n — N one obtains 


(5.8) PINS 0) = pN; 00 — ph) + p(N — 1; А А. 


Rearranging (5.6), one obtains 


it +h) — pln; ` 
а= =. ed iA 

+ Жыр — 150) + pnpa + 150). 
Letting Л tend to 0, one finally obtains for n > 1 


(5.10) — = —, + upin; 0) 


+ Арп — 131) + Карп + 1;1). 
Similarly, for n — 0 one obtains 


д 
(5.11) 3,7 0:0 = —Ayp(0; 0) + py p(1; 1). 


The question of the existence and uni 


queness of solutions of these equations 
is nontrivial and is not discussed here. 


We solve these equations only in the case that 


Qe Asher ed ш... A 
(5.12) = lá | 
ба == m ym дш... (у, 
which corresponds to the assumptions made before (3.1). Then (5.11) 
becomes 
(5.13) 


a 
3,/0:0 = —àp(0; n, 


which has solution (under the assumption p(0; 0) 


= 0) 
(5.14) 


PO; 1) = e", 


SEC. 5 BIRTH AND DEATH PROCESSES 267 


Next (5.10) for the case п = 1 becomes 


65.15) D = —2p(1; 1) + Ap(0: t), 

which has solution (under the assumption p(1;0) = 0) 

(5.16) ps) = де“ Got e” p(0 t") dt’ 
= Ate. 


Proceeding inductively, one obtains (assuming р(п; 0) = 0) 


QD" -u 


(5.17) рт) = rE” 


so that the size X, of the population at time г obeys а Poisson probability 
law with mean At. 


THEORETICAL EXERCISES 


5.1. The Yule process. Consider a population whose numbers can (by splitting 
or otherwise) give birth to new members but cannot die. Assume that the 
probability is approximately equal to / that in a short time interval of 
length л a member will create a new member. More precisely, in the model 
of section 5, assume that 

A, mn, иһ = 0. 
ation size is К, show that the probability that the 


If at time 0 the popul 
1 to n is given by 


population size at time ¢ is equal 


„ӨҢ е, п> К 


(5.18) pin; t) = (" Е H 
Show that the probability law defined by (5.18) has mean т and variance 
a? given by 
(5.19) 


m ket, а? = kee — 1). 


GHAPTER 7 


Random Variables 


it has been stressed in the foregoin 
random event can be discussed on 


gy of sample description 
€ shall see, the notion is 
many applications of probability theory 


1. THE NOTION Or A RANDOM VARIABLE 


In applications of Probability theory one usually has to deal simul- 
random phenomena. In section 7 of Chapter 4, we 
phenomena by means of the 
phenomenon. However, this 


> V, and 
added, so that XX. 


268 


++ are random variables. For 


SEC. 1 THE NOTION OF А RANDOM VARIABLE 269 


the purpose of defining the terminology we consider a random variable 
which we denote by X. 

The notion of a random variable is intimately related to the notion of 
а function, as the following definitions indicate. 


THE DEFINITION OF A FUNCTION. An object X, or X(-), is said to be a 
function defined on a space S if for every member s of S there is a real 
number, denoted by X(s), which is called the value of the function X at s. 


THE DEFINITION OF A RANDOM VARIABLE. Ап object X is said to be 
random variable if (i) it is a real valued function defined on a sample 
description space on a family of whose subsets a probability function Р[-] 
has been defined, and (ii) for every Borel set B of real numbers the set 


(s: X(s)is in B] belongs to the domain of P[]. 


A random variable then is a function defined on the outcome ofarandom 


phenomenon; consequently, the value of a random variable is a random 
a numerical valued random phenomenon. 


dom phenomenon can be inter- 
X; namely, the random variable 


phenomenon and indeed is 
Conversely, every numerical valued ran 
preted as the value of a random variable 
X defined on the real line for every real number by Xo) m 

One of the major difficulties students have with the notion of a random 
variable is that objects that are random variables are not always defined in 
a manner to make this fact explicit. However, we have previously 
encountered a similar situation with regard to the notion of a random 
event. We have defined a random event as a set on a sample description 
space on which a probability function is defined. In every day discourse 
random events are defined verbally, so that in order to discuss a random 
event one must first formulate the event in a mathematical manner as a set. 
Similarly, with regard to random variables, one must learn how to 
recognize, and formulate mathematically as functions, verbally described 


objects that are random variables. 


> Example 1A. The number of white balls in a sample is a random yariable. 
( defined as follows: X is the number of white 


Let us consider the object X 
balls in a sample of size 2 drawn without replacement from an urn contain- 
ing 6 balls, of which 4 are white. The sample description space S of the 


experiment of drawing the sample may be taken as the set of 30 ordered 
2-tuples given in (3.1). of Chapter 1, in which the white balls have been 
numbered 1 to 4 and the remaining 2 balls, 5 and 6. To render S a 
probability space, we need to define a probability function upon its subsets; 
let us do so by assuming all descriptions equally likely. The number X of 
white balls in the sample drawn can be regarded as a function on this 


270 RANDOM VARIABLES CH. 7 


probability space, for if the sample description s is known then the value 
of X is known. 


(1.1) X(s)=0  ifs—(5,6,(6,5) 
=1 з= (1, 5), (1,6), (2, 5), (2, б), (3, 5), (3, 6), (4, 5), (4, 6) 
(5, 1), (6, 1), (5, 2), (6, 2), (5, 3), (6, 3), (5, 4), (6, 4) 
=2 іғ= (1,2), (1,3), (1, 4), (2, 1), (2,3), (2,4), (3, 1), 3,2) 
(3, 4), (4, 1), (4, 2), (4, 3). 4 


EXERCISE 


1.1. Show that the following quantities are random variables by explaining 
how they may be defined as functions on a probability space: 
(i) The sum of 2 dice that are tossed independently. 
(ii) The number of times a coin is tossed until a head appears for the first 
time. 
(iii) The second digit in the decimal expansion of a number chosen on the 
unit interval in accordance with a uniform probability law. 
(iv) The absolute value of a number chosen on the real line in accordance 
with a normal probability law. 
(v) The number of urns that contain balls bearing the same number, when 
52 balls, numbered 1 to 52, are distributed, 1 to an urn, among 52 urns, 
numbered | to 52. 
(vi) The distance from the origin of a 2- 
in accordance with a known 
density function f (ту, x9). 


tuple (ту, xy) in the plane chosen 
probability law, specified by the probability 


2. DESCRIBING A RANDOM VARIABLE 


Although, by definition, a random variable X is a function on a 
probability space, in probability theory we are rarely concerned with the 
functional form of Y, for we are not interested in computing the value X(s) 
that the function Y assumes at any individual member s of the sample 
description space S on which X is defined. Indeed, we do not usually 
wish to know the space S on which X is defined. Rather, we are interested 
in the probability that an observed value of the random variable X will lie 
in a given set B. We are interested in a random variable as a mechanism 
that gives rise to a numerical valued random phenomenon, and the 
questions we shall ask about a random variable Y are precisely the same 
as those asked about numerical valued random phenomena. Similarly, the 
techniques we use to describe random variables are precisely the same as 
those used to describe numerical valued random phenomena. 


SEC. 2 DESCRIBING A RANDOM VARIABLE 271 


To begin with, we define the probability function of a random variable X, 
denoted by Рх[1, as a set function defined for every Borel set B of real 
numbers, whose value Py[B] is the probability that X is in B. We some- 
times write the intuitively meaningful expression P[X is in B] for the 
mathematically correct expression РВ). Similarly, we adopt the 
following expressions for any real numbers а, b, and =: 


Pla < X < b] = Pxl(real numbers x: а «zx bj 
Q.1) PLX < x] = Py[(real numbers a: 2^ X x] 
PLY = x] = P y[(real numbers 2”: a' = х}] = Р). 
function Px[] of the random variable X 


[-], which exists on the sample description 
function, by means of the following 


One obtains the probability 
from the probability function P| 
space S on which X is defined as a 
basic formula: for any Borel set B of real numbers 


(2.2) PX[B] = Pls: X(s) is in Bj]. 


Equation (2.2) represents the definition of P [B]; itisclear that it embodies 
the intuitive meaning of P «[B] given above, since the function X will have 


an observed value lying in the set B if and only if the observed value s of 
the underlying random phenomenon is such that X(s) is in B. 


> Example 2A. The probability function of the number of white balls in 

а sample. To illustrate the use of (2.2), let us compute the probability 

function of the random variable X defined by (1.1). Assuming equally 

likely descriptions on S, one determines for any set B of real numbers that 

the value of P [B] depends on the intersection of B with the set 10, 1, 2): 
PX4[B] - 0. 15 15 Ж 15 i$ 1 

{2} {0,1} 0, 2} {1,2} {0,1,2} 44 


if BO, 1,2}=0 {0} ib 

We may represent the probability function Рх[) of a random variable 
as a distribution of a unit mass over the real line in such a way that the 
amount of mass over any set В of real numbers is equal to the value P y[B] 
of the probability function of X at В. We have seen in Chapter 4 that a 
distribution of probability mass may be specified in various ways by means 
of probability mass functions, probability density functions, and distribu- 
tion functions. We now introduce these notions In connection with random 
variables, However, the reader should bear constantly in mind that, as 
mathematical functions defined on the real line, these notions have the 
same mathematical properties, whether they arise from random variables 


or from numerical valued random phenomena. 


272 RANDOM VARIABLES cH. 7 


The probability law of a random variable X is defined as a probability 
function PẸ] over the real line that coincides with the probability function 
Р х[1 of the random variable X. By definition, probability theory is con- 
cerned with the statements that can be made about a random variable, 
knowing only its probability law. Consequently, a proposition stated about 
a probability function P[-] is, from the point of view of probability theory, 
a proposition stated about all random | variables X, Y,..., whose 
probability functions Px[.], P;-[-], . . . coincide with Pf’). | 

Two random variables X and Y are said to be identically distributed if 
their probability functions are equal; that is, PxX[B] = P,-[B] for all Borel 

sets B. 


The distribution function of a random variable X, denoted by Fx(-), is 
defined for any real number x by 


(2.3) Ех(ж) = P[X < x]. 

The distribution function Fx() of a random variable possesses all the 
properties stated in section 3 of Chapter 4 for the distribution function of a 
numerical valued random phenomenon. The distribution function of X 
uniquely determines the probability function of Y. ; 

The distribution function may be used to classify random variables into 


types. A random variable X is said to be discrete or con 


tinuous, depending on 
whether its distribution function F(*) is discrete or continuous. 
The probability 


mass function of a random variable Y, denoted by px C), 
is а function whose value Px(x) at any real number x represents the 
probability that the observed value of the random variable X will be equal 
10 x; in symbols, 


(2.4) Px) = Р[Х = 2] = Pxife': а = x}. 
A real number x for which Px(x) is 
point of the random variable Y. Fro 
may obtain the probability mass fun 
(2.5) 


positive is called a probability mass 
m the distribution function F4(*) one 
ction ру(:) by 
Рх\®) = Fy(x) — lim Fx(a). 

A random variable ХЖ is diser 
function over the points at which 
infinite number) is equal to 1 


ete if the sum of the probability mass 
itis positive (there are at most a countably 
; in symbols, Y is discrete if 
Q.6) over all PA Рх@) Ee 

that px(z)- 0 
In other words, a random variable X is discrete When one distributes a unit 
mass over the infinite line in accordance with the probability function Рх[] 


if one does so by attaching a positive mass Рх(®) to each of a finite or a 
countably infinite number of points. 


SEC. 2 DESCRIBING А RANDOM VARIABLE 273 


Ifa random variable Y is discrete, it suffices to know its probability mass 
function p x(:) in order to know its probability function P х[], for we have 
the following formula expressing P xf] in terms of px(). If X is discrete, 
then for апу Borel set B of real numbers 


(2.7) PX[B] = Р(Х isin B] — M Px()- 
ы overall points zin B 
such that p ү(т)>0 


Thus, for a discrete random variable X, to evaluate the probability Р [В] 
that the random variable X will have an observed value lying in B, one has 
only to list the probability mass points of X which liein B. One then adds 
the probability masses attached to these probability mass points to obtain 
Рх[В]. 
The distribution function of a discrete random 
terms of its probability mass function by 
Fx(@) = 2 px’). 


(2.8) over all points z' <2 
such that p (7) 0 


variable X is given in 


The distribution function Fy(-) of a discrete random variable X is what 


might be called a piecewise constant or “step” function, as diagrammed in 
Fig. 3A of Chapter 4. It consists of a series of horizontal lines over the 
intervals between probability mass points; ata probability mass point v, 
the graph of Fy(:) jumps upward by an amount p y(). 


> Example 2B. A random variable X has a binomial distribution with 
parameters n and p if it is a discrete random variable whose probability 


mass function px(-) is given by, for any real number z, 


(2.9) px(x) = (") pü-p'- їГё==0,1„* s, 


=0 otherwise. 


Thus for a random variable X, which has a binomial distribution with 


parameters л = 6 and p = $ 


P < Х<2]= (2 (JG) — 0.3292 


merca QU «OU em < 


P» Example 2C. Identically distributed random variables. Some insight 
into the notion of identically distributed random variables may be gained 
by considering the following simple example of two random variables that 
are distinct as functions and yet are identically distributed. Suppose one 


274 | RANDOM VARIABLES cH. 7 


is tossing a fair die; consider the random variables X and Y, defined as 
follows: 


Value of X, if outcome of die is Value of Y, if outcome of die is 
2i 


1,2.3 2 4,5,6 
1 4,5 І 2,3 
0 6 0 1 


It is clear that both X and Y are discrete random variables, whose 
probability mass functions agree for all x; indeed, Px(2) = py) = 4, 
Px) = ру(1) = 3, px) = pj(0) = 1, руа) = py(z) = 0 for x 5 0), 1, 
or 2. Consequently, the probability functions P [В] and P,-[B] agree for 
all sets B. 


Ifa random variable Y is continuous, there exists a nonnegative function 
хб), called the probability density function of the random variable X, 
Which has the following Property: for any Borel set B of real numbers 


(2.10) Px[B] = P[X is in B] = | E 


In words, for a continuous random variable Y, once the probability 
density function f'.(-) is known, the value P x[B] of the probability function 


at any Borel set B may be obtained by integrating the probability density 
function /\(-) over the set B. 

The distribution function Fx() of a continuous random variable is given 
in terms of its probability density function by 


(2.11) F(x) af Губа") а. 


In turn, the probability densit 


y function of a continuous random variable 
can be obtained from its distr 


ibution function by differentiation: 


1 
(2.12) I x(t) = = Fue) 


p> Example 2D. A random variable Y is sai 


s and if constants m and с 
and о > 0, such that the 


any real number z, 


(2.13) didis 3 T 


d to be normally distributed 
exist, where —co < m < oo 
probability density function f'(-) is given by, for 


OV 27 


SEC. 2 DESCRIBING A RANDOM VARIABLE 275 


Then for any real numbers a and b 


с 


(2.14) P[a x X < 5] = [50e = о(° = ") e(t = 2) | 


For a random variable X, which is normally distributed with parameters 
m = 2 апіс = 2, 

Р 2—2 1—2 
П<Х<2]=Р[1< Х<2]=Ф(——)—®Ф - = 0.1915. «4 


We conclude this section by making explicit mention of our conventions 
concerning the use of the letters p, /; and F, and the subscripts X, Y,.... 
We shall always use p(-) to denote a probability mass function and then add 
as a subscript the random variable (which could be denoted by X, Y, Z, U, 
V, W, etc.) of which it is the probability mass function. Thus, pp() 
denotes the probability mass function of the random variable U, whereas 
Pu(u) denotes the value of py(-) at the point и. Similarly, we write fx(-), 
ЛУО), £20; fo). fi s fii) to denote the probability density function, 
respectively, of X, Y, Z, U, V, W. Similarly, we write Fx), Fy), Е), 
Fo), Fy(), Е.) to denote the distribution function, respectively, of 


X, Y, Z, U, V, №. 


EXERCISES 


In exercises 2.1 to 2.8 describe the probability law of the random variable 


Biven. 


2... The number of aces in a hand of 13 cards drawn without replacement 


from a bridge deck. 

2.2. The sum of numbers on 2 balls drawn with гері 
replacement) from an urn containing 6 balls, numbered 1 to 6. 

23. The maximum of the numbers on 2 balls drawn with replacement (without 
replacement) from an urn containing 6 balls, numbered 1 to 6. 

2.4. The number of white balls drawn in a sample of size 2 drawn with replace- 
ment (without replacement) from an urn containing 6 balls, of which 4 
are white. 

2.5. The second digit in the decimal expansion of a number chosen on the unit 
interval in accordance with a uniform probability law. 


fair coin is tossed until heads appears (i) for the 
ond time, (iii) the third time. 


acement (without 


2.6. The number of times a 
first time, (ii) for the sec 


2-7. The number of cards draw 
until (i) a spade appears, (ii) an ace ар 


n without replacement from a deck of 52 cards 
pears. 


216 RANDOM VARIABLES CH. 7 


2.8. The number of balls in the first urn if 10 distinguishable balls are distri- 


buted in 4 urns in such a manner that each ball is equally likely to be 
placed in any urn. 


In exercises 2.9 to 2.16 find P[I < X < 2] for the random variable X 
described. 


2.9. X is normally distributed with parameters m = 1 and с = 1. 
2.10. X is Poisson distributed with parameter 4 = 1. 
2.11. 


2.12. 
2.13. 


X obeys a binomial probability law with parameters n = 10 and p =0.1. 
X obeys an exponential probability law with parameter A = 1. 


X obeys a geometric probability law with parameter p = 1. 


2.14. X obeys a hypergeometric 


probability law with parameters N = 100, 
р = 0.1, п = 10. 


2.15. 


Х 15 uniformly distributed over the interval 4 toż. 
2.16. 


X is Cauchy distributed with parameters х = 1 and f = 1. 


3. AN EXAMPLE, TREATED 


FROM THE POINT OF VIEW OF 
NUMERICAL n 


-TUPLE VALUED RANDOM PHENOMENA 


In the next two sections we discuss an example that illustrates the need 
toi 


ntroduce various concepts concerning random variables, which will, in 
rn, be presented in the course of the discussion. We begin in this section 


by discussing the example in terms of the notion of a numerical valued 
random phenomenon in order to show the similarities and differences 
between this notion and that of a га 


ndom variable, 

Let us consider a commuter who is in the habit of taking a train to the 
city; the time of departure from the station is given in the railroad time- 
table as 7:55 А.М. However, the commuter notices that the actual time of. 
departure is a random phenomenon, varying between 7:55 and 8 A.M. Let 
us assume that the probability law of the random phenomenon is specified 
ensity function f,(-); further, let us assume 


(3.1) he) = 56 – 23) forü cz < 5 
=0 otherwise. 
in which æ represents the n 


SEC. 3 EXAMPLE——/-TUPLE VALUED RANDOM PHENOMENA 277 


at 7:30 A.M. every day, his time of arrival at the station is a random 
phenomenon, varying between 7:55 and 8 A.M. Let us suppose that the 
probability law of this random phenomenon is specified by a probability 
density function /,(-); further, let us assume that f,(-) is of the same 
functional form as fi(.), so that 


(3.2) filets) = 6 —%) Юг0<%<5 
=0 otherwise, 
in which 2, represents the number of minutes after 7:55 A.M. that the 
commuter arrives at the station. 
The question now naturally aris 
train? Of course, this question canno 
can answer the question: what is the 


catch the 7:55 А.М. train? 
Before any attempt can be made to answer this question, we must 


express mathematically as a set on à sample description space the random 
event described verbally as the event that the commuter catches the train. 
Further, to compute the probability of the event, a probability function on 
the sample description space must be defined. 

As our sample description space S, we take the space of 2-tuples (2, s) 
of real numbers, where ху represents the time (in minutes after 7:55 A.M.) 
at which the train departs from the station, and a, denotes the time (in 
minutes after 7:55 A.M.) at which the commuter arrives at the station. The 
event A that the man catches the train is then given as a set of sample 
descriptions by A = EA t): > xy}, since to catch the train his 
arrival time ay must be less than the train’s departure time x}. The event A 
is diagrammed in Fig. 3A. 


We define next a probability functio $ ! 
We use the considerations of section 7, Chapter 4, concerning numerical 


2-tuple valued random phenomena. In particular, let us suppose that the 
Probability function Р[) is specified by a 2-dimensional probability density 
function /(.,.). From a knowledge of f(.,.) we may compute the 
Probability P[A] that the commuter will catch his train by the formula 


(3.3) Р[А] =f [fe хь) dr, dtg 
A 


es: will the commuter catch the 7:55 A.M. 
t be answered by us; but perhaps we 
probability that the commuter will 


n P[-] on the events in S. To do this, 


2 | 4 | * dif (2) 


=|" 74) ах f (ху, а) 
-o Za 


278 RANDOM VARIABLES cH. 7 


i i ions follow by the usual rules of 
i hich the second and third equations : 
sedere for evaluating double integrals (or integrals over the plane) by 
i i i ls. 
ans of iterated (or repeated) single integra | , ) 
Ne next determine whether the function f(. , .) is specified by our having 
specified the probability density functions /1(:) and /5(-) by (3.1) and (3.2). 


2» 


em 


Fig. 3A. The event A that the man cat 
as a set of points in the [CON 2,)-plane. 


ches the train represented 
More generally, we consider the question: $ 

the individual probability density functions ДО) and р) and the joint 
probability density function J(-,.)? We show first that from a knowledge 


of f(. , .) one may obtain a knowledge of fi) and С) by the formulas, for 
all real numbers x, and d, 


he =| fins zd des 


what relationship exists between 


(3.4) к 
f) -[ Гау, 23) а. 


Conversely, we show by а Beneral example that from a knowledge of fiC) 
and f,(+) one cannot obtain a knowledge of f(. , .), since f.) is not uniquely 
determined by fi(-) and fo); more precisely, we show that to given probability 
density functions f,(-) and J) there exists an infinity of functions f(. . -) 
that satisfy (3.4) with respect to f,(-) and fx(-). 

To prove (3.4), let F,(-) and Fx) be the distribution functions of the 
first and second random phenomena under Consideration; in the example 


SEC. 3 EXAMPLE—J/I-TUPLE VALUED RANDOM PHENOMENA 279 


discussed, F,(-) is the distribution function of the departure.time of the 
train from the station, and F,(-) is the distribution function of the arrival 
time of the man at the station. We may obtain expressions for F,(-) and 
F,(-) in terms of f(., .), for A@) is equal to the probability, according to 
the probability function PĮ], of the set (0017,2): a <а4,—0 < Ld 
< оо}, and similarly А2) = Per): —99 < a, < 00,9 X xj]. 
Consequently, 


Күз) = [ | 74] ds f (s 2) 
(3.5) ya Фен 
Fle = ^ а [nire s 


We next use the fact that 


d 
(3.6) peed = Ме Л) = Һе? 


By differentiation of (3.5), in view of (3.6), we obtain (3.4). 

Conversely, given any two probability density functions fiC) and fC), 
let us show how one may find many probability density functions f(. , -) 
to satisfy (3.4). Let A be a positive number. Choose a finite nonempty 
interval a, to b, such that (ж) > A fora, € tj X bi Similarly, choose 
a finite nonempty interval & to bə such that f(t) = A for dy < t < bs. 
Define a function of two variables A(. , -) by 


А т : a, + а) 
(3.7) h(x, x) = А sin РЕ (а BL 
| ы а + bz 
sin | rV? 2 
а, X ty Ж by 


if both a, € 2; < br 


=0 otherwise. 
Clearly, by construction, for all a, and zs 


Q.8) Wen 5) Л) fol)» 


А ау, 25) d, = | d (ту, to) do -f а I e ж») dary dit, = 0. 


Define the function f(. , -) for any real numbers х; and 2» by 


(3.9) fly, з) = fien) fien + т, 25). 


280 RANDOM VARIABLES cH. 7 


It may be verified, in view of (3.8), that f(.,.) is a probability density 
function satisfying (3.4). 


We now return to the question of how to determine f(. , .). There is опе 
(and, in general, only one) circumstance in which the individual probability 
density functions f,(-) and f,(-) determine the Joint probability density function 
(.,.), namely, when the respective random phenomena, whose probability 
density functions are f) and fx(-), are independent. 

We define two random phenomena as independent, letting P,[-] and P4[-] 
denote their respective probability functions and P[]their joint probability 
function, if it holds that for all real numbers ау, by, аз, and by 


(3.10) P[(m, 2): а < а X b ay < 2 < by] 


=P: ax БР: а < ay < 63). 


Equivalently, two random 
F,(-) denote their res 
distribution fu 


phenomena are independent, letting Р) and 
pective distribution functions and F(.,.) their joint 
nction, if it holds that for all real numbers x, and ty 


(3.11) Fy, 23) = Fe) Fins). 


Equivalently, two continuous random 
ДО) and fj) denote their res 
ЈС.) their joint probability d 
2, and a, 


phenomena are independent, letting 
pective probability density functions and 
ensity, if it holds that for all real numbers 


(3.12) 65,23) = fies) foo). 


Equivalently, two discrete 


random phenomena are independent, letting 
Pi C) and p,(+) denote their г 


espective probability mass functions and po.) 


their joint probability mass function if it holds that for all real numbers 2 
and 2, 
(3.13) 


p, ж) = Ру(у)р»(д). 


The equivalence of the foregoing Statements concerning independence 
may be shown more or less with ease by using the relationships developed 
in Chapter 4; indications of the Proofs are contained in section 6. 

Independence may also be defined in terms of the notion of an event 
depending on a phenomenon, which is analogous to the notion of an event 
depending on a trial developed in section 2 of Chapter 3. An event A is 


SEC. 3 EXAMPLE—/I-TUPLE VALUED RANDOM PHENOMENA 281 


said to depend on a random phenomenon if a knowledge of the outcome of the 
phenomenon suffices to determine whether or not the event A has occurred. 
We then define two random phenomena as independent if, for any two 
events A, and A depending, respectively, on the first and second 
phenomenon, the probability of the intersection of A, and A, is equal to 


the product of their probabilities: 
(3.14) P[A, Ag] = P[A;]P[A3]. 


As shown in section 2 of Chapter 3, two random phenomena are 
independent if and only if a knowledge of the outcome of one of the 
phenomena does not affect the probability of any event depending upon 
the other phenomenon. 

Let us now return to the problem of 
and let us assume that the commuter’s arr 
nes are independent random phenomena. 

3) 


the commuter catching his train, 
ival time and the train's departure 
Then (3.12) holds, and from 


(3.15) Р[А] - du, ме] а fala) 
-f ds feo | ае 


Since f,(-) and /3() are specified by (5.1) and (5.2), respectively, the 
probability P[A] that the commuter will catch his train can now be com- 
puted by evaluating the integrals in (3.15). However, in the present 
example there is a very special feature present that makes it possible to 


evaluate Р[А] without any laborious calculation. . | 
The reader may have noticed that the probability density functions /1(7) 


and f,(-) have the same functional form. If we define f (°) by f@) = #(5—2) 
or 0, depending on whether 0 << 5 or otherwise, we find that 
Ale) = fia) = f(a) for all real numbers 2. In terms of fi (), we may write 
(3.15), making the change of variable тү = %2 and z,' = x in the second 
integral, 


G.16) Р[А] =|" diy feo ^ tf (2) 


= [areol duy Га). 


282 RANDOM VARIABLES cH. 7 
By adding the two integrals in (3.16), it follows that 


© 


de, fe) | dea fex) = 1, 


2P[A] zi 


We conclude that the probability P[A] that the man will catch his train is 
equal to 3. 


EXERCISES 


3.1. Consider the example in the text. Let the probability law of the bord] 
departure time be given by (3.1). However, assume that the man’s arriva 
time at the railroad station is uniformly distributed over the interval 
7:55 to 8 А.М. Assume that the man's arrival time is independent of the 


train’s departure time. Find the probability of the event that the man 
Will catch the train. 


3.2. Consider the example in the text. Assume that the train's departure p 
and the man's arrival time are independent random phenomena, each 


uniformly distributed over the interval 7:55 to 8 a.m. Find the probability 
of the event that the man will catch the train, 


4. THE SAME EXAMPLE TREATED FROM THE POINT 
OF VIEW OF RANDOM VARIABLES 


We now treat the example considered in th 
of random variables, 


does not replace the id 
rather extends it. 


We let Y, and y, denote, respectively, the departure time of the train 
and the arrival time ofthe commuter at the station. In order, with complete 
rigor, to regard X, and Y, as random variables, we must state the probability 


€ defined as functions. Let us first consider Xj. 


ү € identity function (so that X(x) = жу, for all тү 
i » Оп which a probability distribution (that is, а 
distribution of Probability mass) has been placed in accordance with the 
Оп fiC) given by (3.1). Or we may define X, as a 
2-tuples (ж, 2) of real numbers, on which a 


as been placed in accordance with the probability 
density function f(. , .) given by (3.12); in this case we define X;(s) = 


An, %)) = ху. Similarly, we may regard X» as either the identity 
function on a real line Rə, on which a probability distribution has been 
placed in accordance with the probability density function ЉС) given by 


€ foregoing section in terms 
We shall see that the notion of a random variable 
еа of a numerical valued random phenomenon but 


SEC. 4 SAME EXAMPLE—RANDOM VARIABLES 283 


(3.2, or as the function with values X,((7, %2)) = tə, defined on the 
probability space S. In order to consider X, and Х in the same context, 
they must be defined on the same probability space. Consequently, we 
regard Y, and X, as being defined on 5. 

It should be noted that no matter how X; and X; are defined as functions 
the individual probability laws of X; and X, are specified by the probability 
density functions fy. C) and fx,(-), with values at any real number х=, 


(4.1) fv, 2) = fx, (0 = 965 — a) for0<#<5 
=0 otherwise. 


Consequently, the random variables X, and Xs are identically distributed. 


We now turn our attention to the problem of computing the probability 
that the man will catch the train. In the previous section we reduced this 
problem to one involving the computation of the probability of a certain 
event (set) on a probability space. In this section we reduce the problem 
to one involving the computation of the distribution function of a random 
variable; by so doing, we not only solve the problem given but also a 


number of related problems. 


Let Y — X, — X, denote the difference between the train's departure 


time X, and the man's arrival time Xo. It is clear that the man catches the 
train if and only if Y — 0. Therefore, the probability that the man will 
catch the train is equal to P[Y > 0]. In order for P[Y > 0] to be a 
meaningful expression, it is necessary that Y be a random variable, which 
is to say that Y isa function on some probability space. This will be the 
case if and only if the random variables X, and Xs are defined as functions 
on the same probability space. Consequently, we must regard X, and Xs 
as functions on the probability space S, defined in the second paragraph 
of this section. Then Y is à function on the probability space S, and 
PLY > 0] is meaningful. Indeed, we may compute the distribution function 


Fy(:) of Y, defined for any real number y by 
(42) р) = PIY <= P: YO 500. 


Then P[Y > 0] = 1 — Fy(0). 
To compute the distribution function Fy(:) of Y, there are two methods 


available. In one method we use the fact that we know the probability 
Space S on which Y is defined as a function and use (4.2). A second 
method is to use only the fact that Y is defined as a function of the random 
variables Y, and Xs. The second method requires the introduction of the 
notion of the joint probability law of the random variables X, and Xs and 
is discussed in the next section. We conclude this section by obtaining 


Fy(-) by means of the first method. 


284 RANDOM VARIABLES cH. 7 


As a function on the probability space S, Y is given, at each 2-tuple 


(а, ж), by Y((%4, %)) = а — аъ. Consequently, by (4.2), for any real 
1» "27 
number y, 


(4.3) Fry) = Ра, 2): 2, — = < yy 


z | | Гб 25) der, de, 


KG, 2,):z,—2, y) 


=|" 4) du, f (25, Xp). 
—o zi—y 


From (4.3) we obtain an expression for the probability density function 
Jy C) of the random variable Y. In the second integration in (4.3), make 
the change of variable a’ = =X + z,. Then 


о у 
Fy(y) -[ 4s | doy f (zs, 2% — ay). 
-%0 —o 
By interchanging the order of integration, we have 


(44) һо) =|" dey |” dns fen — x) 


By differentiating the ex 


pression in (4.4) with respect to y, we obtain the 
integrand of the integrati 


on with respect to æ’, with æ replaced by y; thus 


а ao 
(4.5) fry) = di Fy(y) zii dr, f (5, ау — y). 


Equation (4.5) constitutes а general expression 
function of the random variable Y defined on a 
by Ү((х\,хь)) = Жү — x, where a probability 
S by the probabilit (у density function feos ? 

To illustrate the use of (4.5), let us consider again the probability density 
functions introduced in connection with the Problem of the commuter 
catching the train. The p ili у function f(.,.) is given by 
(3.12) in terms of the functions f,(-) and ЖО), given by (3.1) and (3.2), 
respectively. 


for the probability density 
space S of 2-tuples (ту, a) 
function has been specified on 


In the case of independent phenomena, (4.5) becomes 


(4.6) fr) =|" айу -y= ГИ у + ууа). 


SEC. 5 JOINTLY DISTRIBUTED RANDOM VARIABLES 285 


If further, as is the case here, the two random phenomena [with respective 
probability density functions ЛО) and ЉС) are identically distributed, so 
that, for all real numbers z, (0) = fhe) = f(x), for some function f(-), 
then the probability density function fy C) is an even function; that is, 
Sy(—y) = fry) for all y. It then suffices to evaluate fy-(y) for y 2 0. 
One obtains, by using (3.1), (3.2), and (4.6), 


(4&7) fro) - [22 (5 — xx 4- 9) 
=[ (2) 6 -a6-(435) if0<y<5 


0 ify 2. 
Therefore, 
4|y[? — 300] + 1000 А 
(4.8) fe) = ia Я E if lyl <5 


=0 otherwise. 


= 1 
Consequently P[Y> 0] -Í fro) dy = z 


EXERCISES 
e Y defined in the text. Find the probability 
he assumptions made in exercise 3.1. 


fined in the text. Find its probability 
made in exercise 3.2. 


4.1. Consider the random variabl 
density function of Y under t 


4.2. Consider the random variable Y de 
density function under the assumptions 


5. JOINTLY DISTRIBUTED RANDOM VARIABLES 

Two random variables, Y; and Xs, are said to be jointly distributed if 
they are defined as functions on the same probability space. It is then 
Possible to make joint probability statements about X, and X, (that is, 
Probability statements about the simultaneous behavior of the two random 
Variables). In this section we introduce the notions used to describe the 
Joint probability law of jointly distributed random variables. 

_The joint probability function, denoted by Py,y,[] of two jointly 
distributed random variables, is defined for every Borel set B of 2-tuples 
of real numbers by 
(3.1) Ру x [B] = Pils in 5: 060) X,(5)) is in BJ], 


286 RANDOM VARIABLES cH. 7 


i i denotes the sample description space on which the random 
bia x and X, are diced and P[-] denotes the probability function 
defined on S. In words, Px, x, [8] represents the probability that the 
2-tuple (X1, X) of observed values of the random variables will lie in the 
set B. For brevity, we usually write 


(5.2) Px,,x,{B] = P[(%4, X?) is in В], 


instead of (5.1). However, it should be kept constantly in mind that the 
right-hand side of (5.2) is without mathematical content of its own; rather, 
it is an intuitively meaningful concise way of writing the right-hand side of 
5.1). 

| It is useful to think of the joint probability function Py, E] of two 
jointly distributed random variables A, and X, as representing the distribu- 
tion of a unit amount of probability mass over a 2-dimensional plane on 
which rectangular coordinates have been marked off, as in Fig. 7A of 
Chapter 4, so that to any point in the plane there corresponds a 2-tuple 
(25, 2) of real numbers representing it. For any Borel set В of 2-tuples 


Py, x,[B] represents the amount of probability mass distributed over the 
set B. 


We are particularly interested in knowit 
which are combinatorial product sets in 
combinatorial product set if it is of the form B = (001, ж): a is in By and 
ж» is in By} for some Borel sets B, and B, of real numbers. If B is of this 
form, we then write, for brevity, Ру ү [В] = PLY, is in By, X5 is in В]. 

In order to know the joint probability function Py y, LB] for all Borel 


sets B of 2-tuples, it suffices to know it for all infinite rectangle sets By. zs 


^5 
Where, for any two real numbers xı and 25, we define the “infinite rectangle 
set 


ng the value Py, ү [B] for sets В, 
the plane. A set B is called a 


(5.3) Ber, = (0, 23): zy <, ty < ть} 
аз the set consisting of all 2-tuples (25, x") whose first component 2;' is 
less than the specified real number ?; and whose second component =з is 
less than the specified real number х. To specify the joint probability 
function of X, and ХУ, it suffices to specify the joint distribution function 
Fy, x C s.) ofthe rando 


m variables X, and X, defined for all real numbers 
у and 2з by the equation 


(5.4) Fy, x s %) = PIX, < vy X, <a] = Px xl 


B, al- 
In words, Fy x (£o т) represents the Probability that the simultaneous 
observation (X, X,) 


will have the property that X, < х, and Y, < 23. 


SEC. 5 JOINTLY DISTRIBUTED RANDOM VARIABLES 287 


In terms of the probability mass distributed over the plane of Fig. 7A 
of Chapter 4, Fy, x (xy, 2) represents the amount of mass in the "infinite 
rectangle" В, z, i 

The reader should verify for himself the following important formula 
[compare (7.3) of Chapter 4]: for any real numbers а,, а», b1, and ba, such 
that aj < Бу, ay < ba the probability Pla, < X, < by, ay < X, < bj that 
the simultaneous observation (Xj, Хз) will be such that а, < X; X bı and 
a, < X, < b, may be given in terms of Fy,.x,(-» +) by 


(5.5) Pla, < X, <, < < = Fy, x (Pis bj) 
+ Fy, x (t а) — Fx, x (n. 02 — Fy, x (b, 29)- 


owledge of the joint distribution 
uted random variables one may 
d Fy,C) of each of the random 
for any real number 2: 


It is important to note that from à kn 
function Fy xe .) of two jointly distrib 
obtain the distribution functions Fy,() an 
variables Y, and Xj. We have the formula 


(5.6) Fx(a) = PU < ау] = РІА < ®ь X, < 00] 
= lim Fy,x a) = Fx, x v oo). 
ту 0 


Similarly, for any real number 2» 


ACE Xp) = Fy, x, (0s ж). 


(5.7) Fy. (a) = lim Рух 
{ Eae 


In terms of the probability mass distributed over the plane by the joint 
) the quantity Fy (ху) is equal to the 


distribution function Ау, x, C >+ à Eai 
amount of mass in the half-plane that consists of all 2-tuples (2,', 25") that 
ith equation ty = ti 


аге to the left of, or on, the line № 

The function Fy,() is called the marginal distribution function of the 
random variable a corresponding to the joint distribution function 
Fy x, .). Similarly, Fy,() is called the marginal distribution function 
of Y, corresponding to the joint distribution function Fy xe 9) 


We next define the joint probability mass function of two random 
Variables X, and Xs, denoted by рх... » ), as a function of 2 variables, 


with value, for any real numbers 21 and ту. 


($8) py, x (tn) = P X = 7v X, = nj 
- Py, x [e t ay = „з = 


It may be shown that there is only a finite or countably infinite number 
of 2-tuples (a, a) at which px xt х) > 0. The jointly distributed 
random variables X, and X, are Suid to be jointly discrete if the sum of the 


288 RANDOM VARIABLES CH. 7 


joint probability mass function over the points (жу, хь) where Px, x, p Xo) 
is positive is equal to 1. If the random variables X, and X, are jointly 
discrete, then they are individually discrete, with individual probability 
mass functions, for any real numbers з; and =. 


= г, ж (%, 25 
Рх(®) cw ыгы Рух н » 
Dxyy xs (7172) 70 
5.9 
02 Px n2) ES > Рх„х,(®ь 2). 


over all z, such that 
Dx, X(T ,Fq)>0 

Two jointly distributed random variables, X, and Ж», are said to be 
Jointly continuous if they are specified by a joint probability density 
function. 

Two jointly distributed random variables, X; and Х,, are said to be 
specified by a joint probability density function if there is a nonnegative 
Borel function Jx,,x,(- > -), called the joint probability density of X, and 
AX», such that for any Borel set B of 2-tuples of real numbers the probability 


P(X, X») is in B] may be obtained by integrating fy. y (. , -) over B; in 
symbols, 


6.10) Py. [3] = Р(Х, X) is in В] = | | fx, ac Gs) dy! Фу. 
n 


By letting В = Bzz, in (5.10), it follows that the joint distribution function 
for any real numbers z, and x, 


2 May be given by 


= кы + + ГА 
GID ља) | dey [^ ашуу, (e 
—-0 -0 


Next, for any real numbers a, b 
may verify that 


b b. 
(2) Pla < X, X b a, < Xa c b] = | ‘dey! Í * 
а а 


1» 42, bg, such that a, < by, ay < bs, опе 


diy fy x Gn 95). 


The joint probabilit 


y density function may be obtained from the joint 
distribution function 


by routine differentiation, since 


ә 
(5.13) fx, x Qs %) = 92, Ox, Fy x, Gy 2) 
at all 2-tuples (ху, x), where the 
of (5.13) are well defined. 

If the random variables X, and X, 
individually continuous, with individ 


partial derivatives on the right-hand side 


are jointly continuous, then they are 
ual probability density functions for 


SEC. 5 JOINTLY DISTRIBUTED RANDOM VARIABLES 289 
any real numbers а; and 2» given by 
© 
ле = [fme de 
E 


(5.14) x 
fx) -[ „Глб Xp) da. 


The reader should compare (5.14) with (3.4). 
To prove (5.14), one uses the fact that by (5.6), (5.7), and (5.11), 


*i + Ы Li , 
E FAC -Í du, Ji KERA »%') 


Fy Gs) -Í : de | dzy fy, x ay). 


xtend at once to the case of z random variables. 
We list here the most important notations used in discussing z jointly 
distributed random variables X; X5.... К The joint probability 
function for any Borel set В of n-tuples is given by 


(5.15) Py osos [Bl = PIG, Xo 777. X) is in Bl 


The foregoing notions е 


The joint distribution function for any real numbers 2,25, . .. ,, is given by 


(5.16) Fyra xo dt. n) = PIX, Sty X Stns А xac. 


The joint probability density function (if the derivative below exists) is 


given by 
(5.17) Fx aue Watts Zp) 


д" 
x Fy xou Yet Tn). 
n 


Uo 
Ox, Ita tt д: 
The joint probability mass function is given by 


(5.18) py, x Ue 097775 tn) 

= РХ = ty Xe = 2%", Xn = 21]. 
A discrete joint probability law is specified by its probability mass function: 
for any Borel set В of n-tuples 


(5.19) Ру A хЇВ} 
Pxjx.- ‚х(®ь®»`°'› Tp). 
ver all (jg; * ^7 25) in B such that 
icd 


290 RANDOM VARIABLES cH. 7 


A continuous joint probability law is specified by its probability density 
function: for any Borel set B of n-tuples 


(5.20) Px,.x,,---,x,[B] 


ejf ` Ti ades 25,177, 0,) do, dr, - -- dv, 
B 


The individual (or marginal) probability law of each of the random variables 
Xy №,..., X, may be obtained from the joint probability law. In the 
continuous case, for any k=, 2... ‚п and any fixed number 0, 


(5.21) Fx, Gn) =Í ат + J deaf dp | du, 


—® 


1 
An analogous formula may be written in the discrete 


> Example 5A. Jointly discrete random variables, 
size 2 drawn with replacement (without replaceme 
taining two white, one black, and two red balls, Let the random variables 
X, and X; be defined as follows; for k = 1,2, Y, 2 lor 0, depending on 
Whether the ball drawn on the Ath draw is white or nonwhite. (i) Describe 
the joint probability law of (Xi X;). (ii) Describe the individual (or 
marginal) probability laws of ; 1 and X,. 
Solution: The ra 


fx gg tse it tg * * * „а, Жы tt t By). 


case for рү (2,9). 


Consider a sample of 
nt) from an urn con- 


ndom variables X; and X, are clearly jointly discrete. 
Consequently, to describe their joint probability law, it suffices to state 
their joint probability mass function Px aus to). Similarly, to describe 
their individual Probability laws, it suffices to describe their individual 


probability mass functions Px @) and Px,{%). These functions are 
conveniently presented in the following tables: 


Sampling with replacement 


Sampling without replacement 
Poe, ax 8, жь) Px ax G5, 23) 
Yi t 
E 0 l Px, (a) "i 11 o 1 руба) 
0 5 zz a 2 з 
ud dix В 0 X $4 H 
1 ze | 22 i 3 2 
на tsp s 1 |F|] 2 
Px @) à 5 Рх @)| 2 $ 
a | 5 5 
E кеш |.“ я 


SEC. 5 JOINTLY DISTRIBUTED RANDOM VARIABLES 291 


» Example 5B. Jointly continuous random variables. Suppose that at 
two points in a room (or on a city street or in the ocean) one measures the 
intensity of sound caused by general background noise. Let X, and X, 
be random variables representing the intensity of sound at the two points. 
Suppose that the joint probability law of the sound intensities, Y, and X, 
is continuous, with the joint probability density function given by 


xac б t) = аул» exp [— 15? + 2,?)] ifay > 0, а > 0 
= 0 otherwise. 


Find the individual probability density functions of Y, and X, Further, 
find PLY, < 1, Y, < Папа PLY, + X; < 1]. 

Solution: By (5.14), the individual probability density functions are 
given by 


fx) = | жүл, exp [— 101° + 2?)] dea = x, exp (— 43) 
0 


Sy, (2) = | ah exp [—3G + 252) d, = a, exp (— 3050). 
A 0 


Note that the random variables Y, and X, are identically distributed. 
Next, the probability that each sound intensity is less than or equal to 1 
1s given by 


(i 


1 
PIX, x 1, X, « 1] -Í | fx, x00) a) day dir, 
— о J— о 


H 2 1 PE | 
= (| aye da) qi aye "= а) = 0.1548. 
\/0 “0 


The Probability that the sum of the sound intensities is less than 1 is 
Sliven by 


MiG xS fx, ax, Go 29) diy dir 
(xp тү+,<1} 
H NULLIS А 
“0 “0 


> Example 5C. The maximum noise intensity. Suppose that at five 
Points in the ocean one measures the intensity of sound caused by general 
background noise (the so-called ambient noise). Let Xj, X,, X, X4, and X; 

* random variables representing the intensity of sound at the various 


CH. 7 
RANDOM VARIABLES 
292 


Suppose that their joint probability law is continuous, with joint 
ints. > | 
Probability density function given by 


cx, ax, (p Tos 23, 04, U5) = 129232425 
тулы 4-5 р , : , Xs m 
X exp[—3(zj? + zy? + zs xs? + ay 
if 0 — a, 25, m m4, fts 
=0 otherwise. 


Define Y as the maximum intensity; in symbols, 
(X, Xy, Xy, X4, X5). For any positive num 
less than or equal to y is given by 


Y = maximum 
ber y the probability that Y is 


PUY Xy = PIX, € y, X, X y, --- , X, « y] 
y v y . 
-f dz, | йш: f rs fx xx xx (to Xp, *** , Xs) 
-=o Jw о 


= ( || P а) "s (1 — ectáhys, < 


THEORETICAL EXERCISE 


5.1. Multivariate distributions with given marginal distributions. Let RAO and 
УО) be two probability density functions, An infinity of joint probability 
densities f (. , .) exist, of which the marginal probability 
density functions [that is ‚ One method of con- 


& Joint probability density function f C), with 
given marginal probability density functions /,(-) and f,(-), is by defining 
for a given constant a, such that |а| < 1, 


(5.22) f(a, 29) = Ле) fate) + aD Fi) – ПОР) — 1j 
in which Еу) and F,(-) are the distributio; 
AQ and fC), respectively. Show that th 
Corresponding to f'(., .) 15 given by 
(5.23) Fry wy) = Ру) Ра) 1 + afl — Fi — Руа) 
Equations (5.22) and (5.23) аге : J. Gumbel, “Distributions à 
plusieurs variables dont les mar es," C. R. Acad. Sci. Paris, 
Vol. 246 (1958), PP. 2717-2720. 


n functions Corresponding to 
€ distribution function F(., .) 


; EXERCISES 


TA sunto 51:05:45 Consider a sample of size 3 drawn with replacement 
(without replacement) from an urn containing (i) 1 white and 2 black balls, 


SEC. 5 JOINTLY DISTRIBUTED RANDOM VARIABLES 293 j 


(ii) 1 white, 1 black, and | red ball. For k = 1, 2,3 let Х„ = 1 or 0 depending on 
whether the ball drawn on the kth draw is white or nonwhite. 


5.1. Describe the joint probability law of (X3, X», Хз). 

5.2. Describe the individual (marginal) probability laws of X;, X», X. 

5.3. Describe the individual probability laws of the random variables Y}, Yo, 
and Y, in which Y; = X; + X, + X3, Y, = maximum (X, Х, X3). 
and Y, = minimum (Xj, X, Хз). 


In exercises 5.4 to 5.6 consider 2 random variables, X, and X,, with joint 
probability law specified by the joint probability density function 


(а) fx,x,G 29) = 1 if0 <2, <2 and 0 <2, <2 
=0 otherwise. 
(5) fy, x Q3, 22) = e70 +23) ifa 2 0 апа ж > 0 


= 0 otherwise. 


5.4. Find (i) PIX € 1, X, x 1), (ii) PIM, + X» 5 1), (iii) PIX, + X» > 2]. 


5.5. Find (i) PLY, < 25), (ii) PIX; > 1], Gii) PIX, = Xal. 
5.6. Find (i) PLY, > 1 | X, < 1], Gi) PIM > X] Xs > 1]. 
In exercises 5.7 to 5.10 consider 2 random variables, X; and Xo, with the joint 


Probability law specified by the probability mass function py, x.C , -) given for 
all x, апа o, at which it is positive by (a) Table 5A, (b) Table 5B, in which for 


brevity we write Л for qj. 


TABLE 5A 


Px, x C 22 


Px,@) 10h 


5.7. Show that the individual probability mass functions of X; and X, may 
be obtained by summing the respective columns and rows as indicated. 


Are X, and X, (i) jointly discrete, (ii) individually discrete? 
5,8, Find (i) PIX, < 1, X, < 1], (ii) PLX, + Xs < 1), (iii) PIX, + X, > 2). 


294 RANDOM VARIABLES cH. 7 
TABLE 5B 


Px,x,%, XQ) 


к 

d 0 1 2 Px») 
ae 

F 
0 h 4h 9h 14h 
1 2h 6h 12h 20h 
2 3h 8h 3h 14h 
3 4h 2h 6h 12h 

P x) 10h 20h 30h 

59. Find (i) PLY, < 2X4, (ii) PLX, > 1], âii) P[X, = X. 


5.10. Find (i) P[X, > Х| X, > 1], (ii) PIX? + X? < 1). 


6. INDEPENDENT RANDOM VARIABLES 


In section 2 of Cha 
trials. In this section 


function Fy x. y) 
independent if for 
[Xs is in B] and [X, is in В, 
(6.1) P[X, is in В 


and joint distribution 
iables Y, and №, are 


s B, and В, the events 
; that is, 


f real number 
2] are independent 
; and X, is in В) = 

The foregoing definition may be expressed equivalently: the random 
variables Y, and X are independent if for any event A,, depending only 
on the random va. 


! riable X. and any event 4,, depending only on the 
random variable X, P[A\A,] = P[A\]P[Ag], so that the events A, and 4» 


are independent, 
_ It may be shown that if (6.1) holds for Sets B, and B, which are 
infinitely extended intervals of the form В = (5: а < а) and 
В, = {i's a! < £o}, for any real numbers vı and z, then (6.1) holds for 
any Borel sets B, and B, of real numbers, We th, 


erefore have the following 


P[X, is in B,]P[X, is in Bj]. 


SEC. 6 INDEPENDENT RANDOM VARIABLES 295 


equivalent formulation of the notion of the independence of two jointly 
distributed random variables X, and АХ». 

Two jointly distributed random variables, X, and X, are independent if 
their joint distribution function F; vx, , -) may be written as the product of 
their individual distribution functions Е, x -) and F; x,O in the sense that, for 
any real numbers x, and xs, 


(6.2) Fy, x, Gs 29) = Fy (=) Fy (т). 


Similarly, two jointly continuous random variables, X, and X, are 
independent if their joint probability density function fy, y. (. ,.) may be 
Written as the product of their individual probability density functions 
fx, C) and fx. C) in the sense that, for any real numbers тү and ть, 


(6.3) Fac Gs 29) = Sx) fx 


Equation (6.3) follows from (6.2) by differentiating both sides of (6.2) 
first with respect to x, and then with respect to tə. Equation (6.2) follows 
from (6.3) by integrating both sides of (6.3). — 

. Similarly, two jointly discrete random variables, X, and X, are 
independent if their joint probability mass function py, x. , ) may be 
Written as the product of their individual probability mass functions py (-) 
and py (-) in the sense that, for all real numbers x, and х, 


(64) px, axis 2 = Рх(®ОРх,(®Ә. 


Two random variables X, and Xj, which do not satisfy any of the 
foregoing relations, are said to be dependent or nonindependent. 


» Example 6A. Independent and dependent random variables. In example 
ЗА the random variables X, and X, are independent in the case of sampling 
With replacement but are dependent in the case of sampling without 
replacement. In either case, the random variables X, and X, are identically 
distributed. In example 5B the random variables X, and X, are independent 
апа identically distributed. It may be seen from the definitions given at 
the end of the section that the random variables X4, Xa, ..., X; considered 
1n example 5C are independent and identically distributed. < 


Independent random variables have the following exceedingly important 
Property: 

THEOREM 6A. Let the random variables Y, and Y, be obtained from the 
random variables X, and X, by some functional transformation, so that 

1 = (Ху) and Y, = gy( X;) for some Borel functions g,(-) and go(-) of a 
real variable. Independence of the random variables X, and X, implies 
independence of the random variables Y, and Y». 


6 RANDOM VARIABLES сн. 7 
29 


ioni . First, for any set B, of real numbers, 

я = туа ое а is in B,}. j^ is clear that the event 

ро : у m if and only if the event that X; is in gr (B) occurs. 

pure s any set B, the events that Y, is in B, and X, is in g,71(5,) 
inr 2 fail to occur, together. Consequently, by (6.1) 


(6.5) P[Y,isin By, Y, is in Bj] = P[X, is in g,7(Bj), X, is in g;7(B;)] 
= P[X, is in рү (В,)]Р[Х„ is in g,71(B;)] 
= P[g, (X3) is in B,]P[g;(X) is in B3] 
= P[Y, is in ВЈР[У, is in Bj], 

and the proof of theorem 6A is concluded. 


p» Example 6B. Sound intensity is often measured in decibels. A reference 


level of intensity 7, is adopted. Then a sound of intensity X is reported as 
having Y decibels: 


X 
Y- 10108107 > 
0 


Now if X, and X, are the sound intensities at two different points on a city 
Street, let Y, and Y, be the corresponding sound intensities measured in 
decibels. If the original sound intensities X, and X, are independent 
random variables, th 


en from theorem 6A it follows that Y, and Y, are 
independent random variables. 4 


О several jointly distributed 
jointly distributed random variables 
pendent if any one of the following equivalent 
olds: (i) for any n Borel sets By By... , B, of real numbers 
(6.6) PEX; is in B, X is in В, 


The foregoing notions extend at once t 
tandom variables. We define n 
n у X, as inde 
conditions h 


“++, X,isin B,] 


= P[X, is in B]P[X, is in By) +++ PX, is in B,], 
(ii) for any real numbers 2, 


(6.7) Fr х,о. (6, „ЫШЫ Жү Fy @)Fy (ж) syr Fy (2,); 
(iii) if the random variables are jointly continuous, then for any real 
numbers а, ty, .., TA 


(6.8) 


29,..., 2, 


n 


SEC. 


6.1. 


6.2. 


63. 


6.1. 


6.2. 


6.3, 


6 INDEPENDENT RANDOM VARIABLES 297 
THEORETICAL EXERCISES 


Give an example of 3 random variables, Xj, X», X3, which are independent 
when taken two at a time but not independent when taken together. Hint: 
Let Aj, As, Ау be events that have the properties asserted; see example 
ІС of Chapter 3. Define Л; = 1 or 0, depending on whether the event A; 
has or has not occurred. 


Give an example of two random variables, X, and XX, which are not 
independent, but such that X;? and X,* are independent. Does such an 
example prove that the converse of theorem 6A is false? 


Factorization rule for the probability density function of independent random 
variables. Show that п jointly continuous random variables Xj, X», ..., Xn 
are independent if and only if their joint probability density function for 
all real numbers 2, ta .. ., 2, may be written 


SX xus xs ED Watts Ta) = hD) * 7 * hys) 


in terms of some Borel functions /1(), AC), . . . , and /jC). 


EXERCISES 


The output of a certain electronic apparatus is measured at 5 different 
times. Let Ху, Xo,..., Xs be the observations obtained. Assume that 


X, Xa... Х are independent random variables, each Rayleigh dis- 
tributed with parameter « = 2. Find the probability that maximum 


2 2 : 
(Ху, Xs, Xo, Xa X) > 4. (Recall that fx (X) = 2 е—*\З for x > 0 and is 
€qual to 0 elsewhere.) 


Suppose 10 identical radar sets have a failure law following the exponential 
distribution. The sets operate independently of one another and have a 
failure rate of 2 = 1 set/10? hours. What length of time will all 10 radar 


Sets operate satisfactorily with a probability of 0.99? 


Let X and Y be jointly continuous random variables, with a probability 
density function 


fasi) = a erp (B® + yi 


(i) Аге X and Y independent random variables? 

Gi) Are X and Y identically distributed random variables? 

Gii) Are X and Y normally distributed random variables? 

Gv) Find PLY? + Y? < 4]. Hint: Use polar coordinates. 

(V) Are X?and Y? independent random variables? Hint: Use theorem 
6A. 

(vi) Find P[X? < 2), PLY? < 2]. 

(vii) Find the individual probability density functions of X? and M 
[Use (8.8).] 


298 RANDOM VARIABLES CH. 7 


(viii) Find the joint probability density function of X? and Y?, [Use (6.3).] 
(ix) Would you expect that PLY? Y? < 4] > P[X? < 2]P[Y? < 2]? 
(x) Would you expect that P[X? Y? < 4] = P[X? < 2]P[Y? < 2]? 


independent random variables, each uniformly 
ih EROR dep i dem ie 1. Determine the number a such that 

(i) P[at least one of the numbers Ху, Xs, X4 is greater than a] — 0.9, 

(ii) P[at least 2 of the numbers X;, X», Хз is greater than a] = 0.9. 

Hint: To obtain a numerical answer, use the table of binomial probabilities. 
=}, P[B| А] = 1, and 
е defined as X = | or 0, 
ccurred, and Y = 1 or 0, 


as not occurred. State 
or false: 


pendent; 


6.5. Consider two events А and B such that P[A] 
P(A | B] = }. Let the random variables ¥ and Y b 
depending on whether the event 4 has or has not о 
depending on whether the event B has ог h 
whether each of the following statements, is true 


(i) The random variables X and Y are inde 

(ii) PIX? + Y? = 1] =1; 

(iii) Р[ХУ = Х?Ү?] = 1; 

(iv) The random variable Y is unifo 

(v) The random variables X and 
6.6. Show that the two ran 

are independent if their ; 


pendent if their joint robabilit 
Table 5B. i p d 


rmly distributed on the interval O to 1; 
Y are identically distributed. 


, In exercises 6.7 to 6.9 let X, and X, be independent random variables, uniformly 
distributed over the interval 0 to 1. 


6.7. Find (i) PLY, + X, < 0.5], (ii) PLX, 


— X, < 0.5]. 
6.8. Find (i) PIXX, < 


0.5], (ii) РХ, < 0,5], (iii) PLY,? < 0,5]. 
69. Find (i) PL? + x,2 < 0.5], 


(i) Ple: < 0,5], (iii) P[cos 7X, — 0.5]. 


The concepts now assembled plain some of the major 
meanings assigned to the word "random" in the mathematical theory of 
probability. 


possible to make repeated 
For example, X а. „Ж 
light bulbs, or they may be th 


SEC. 7 RANDOM SAMPLES 299 


replacement) from an urn containing balls numbered 1 to 100, and so on. 
The set of п measurements Xj, X, ..., X, is spoken of as a sample of 
size n of the random variable Y, by which is meant that each of the measure- 
ments X,, for k = 1,2,...,n, is a random variable whose distribution 
function Fy,(-) is equal, as a function of z, to the distribution function 
Fx() of the random variable X. If, further, the random variables 
Xo Xs... , X, are independent, then we say that Xy, №,..., X, isa 
random sample (or an independent sample) of size of the random variable 
X. Thus the adjective “random,” when used to describe a sample of a 
random variable, indicates that the members of the sample are independent 
identically distributed random variables. 


> Example 7A. Suppose that the life in hours of electronic tubes of a 
Certain type is known to be approximately normally distributed with 
Parameters m = 160 and с = 20. What is the probability that a random 
Sample of four tubes will contain no tube with a lifetime of less than 
180 hours? 

Solution: Let X, Xo, Хз and X, denote the respective lifetimes of the 
four tubes in the sample. The assumption that the tubes constitute à 
random sample of a random variable normally distributed with parameters 
m = 160 and е = 20 is to be interpreted as assuming that the random 
Variables X,, Y,, Ху, and X; are independent, with individual probability 
density functions, for К = 1,...,% 

— 160)? 
ЕЕЕ 
VEA ) 20V2z P 2 20 
The probability that each tube in the sample has a lifetime greater than, 
°F equal to, 180 hours, is given by 
PIX, > 180, X, > 180, X, > 180, X, > 180] 
= P[X, > 180]PLY, > 180]Р[Х > 180]PLXY', 2 180] = (0.159)*, 


since PLY, > 180] = 1 — о("919) -1-4()-201587. <4 
A second meaning of the word “random” arises when it is used to 
escribe a sample drawn from a finite population. A sample, each of 

Whose components is drawn from a finite population, is said to be a 

a dom sample if at each draw all candidates available for selection have 
П equal probability of being selected. The word "random" was used in 
15 sense throughout Chapter 2. 


> Example 7B. Asin example 7A, consider electronic tubes of a certain 
УРе whose lifetimes are normally distributed with parameters т = 160 


300 RANDOM VARIABLES CH. 7 


tintoa box. Choose 

= 20. Leta random sample of four tubes be pu nt 

due at random from the box. What is the probability that the tube 

selected will have a lifetime greater than 180 hours? | 
Solution: For К = 0, 1,..., 41е A, be the event that thé box contains 

k tubes with a lifetime greater than 180 hours. Since the tube lifetimes are 


independent random variables with probability density functions given by 
(7.1), it follows that 


(7.2) PIA] = ( Я (0.1587)50.8413)+-, 


Let B be the event that the tube selected from the box has a lifetime 


greater than 180 hours. The assumption that the tube is selected at 
random is to be interpreted as assuming that 


k 
(7.3) МУ Ш=т; К=0,1,+++,4, 


The probability of the event B is then given by 


4 S à 
4 = а а= = k a=k 
(14) Ра] = X PIB | AMA) = Y 7 ( PL 
Where we have let p — 0.1587, q — 0.8413. Then 


4 
(1.5) P[B] = p Y (. ы 1) rem =p, 
к 


so that the probability that a tube Selected at random from a random 
sample will have a lifeti 


given in theoretical exercise 4.1 of Chapter 4, 


The word random has a third meaning, 


domly chosen fro 


beying a uniform probability law 
: » Whereas the phrase “н Points chosen randomly 
from the interval а to 5” to describe п independent 
aws over the interval a to b. 
ave long been discussed by 
l probabilities.” In modern 


E dom variables, each obeying 
a uniform probability law, 


SEC. 7 RANDOM SAMPLES 301 


P» Example 7C. Two points are selected randomly on a line of length a 
50 as to be on opposite sides of the mid-point of the line. Find the 
probability that the distance between them is less than 1a. 

Solution: Introduce a coordinate system on the line so that its left-hand 
endpoint is 0 and its right-hand endpoint is а. Let X, be the coordinate 
of the point selected randomly in the interval 0 to 3a, and let Y, be the 
coordinate of the point selected randomly in the interval 3a to a. We 
assume that the random variables X; and X, are independent and that each 
Obeys a uniform probability law over its interval. The joint probability 
density function of Y, and Х is then 


4 a a 
(7.6) Јох t) = = for0 < a, < S5 5 


= 0 otherwise. 


The event 2 that the distance between the two points selected is less than 
За is then the event [X> — X, < ja]. The probability of this event is the 
Probability attached to the cross-hatched area in Fig. 7A. However, this 
Probability can be represented as the ratio of the area of the cross-hatched 
triangle and the area of the shaded rectangle; thus 


1 1[Д(1/3%а? _2 

(7.7) P| x,- х, «1«| = Tae 9' 4 
» Example 7D. Consider again the random variables X, and X, defined 
in example 7C. Find the probability that the three line segments (from 0 
to X, from X, to Х,, and from X; to а) could be made to form the three 
Sides of a triangle. 

Solution: In order that the three-line segments mentioned can form a 
triangle, it is necessary and sufficient that the following inequalities be 
fulfilled (why): 


X,<(%—-X)+(@—%) or 2%<a 
OD @-xX)<(%@—-XKth% or a«2X 
Q5,—X)«X,(a—X) or 2%<a+2X, 


The Probability of these inequalities being fulfilled is the probability of the 
“TOss-hatched area in Fig. 7B, which is clearly 1. It might be noted that if 
Cach of the two points, with coordinates X, апа Ху, are chosen randomly 
9n the interval O to a, then the probability is only } that the three line 
Segments determined by the two points could be made to form the three 
Sides оға triangle. 


„7 
2 RANDOM VARIABLES CH. 
30 


Problems involving geometrical probability have played a maat role ct 
а ent of the modern conception of probability. In the nin 
ee end. = the Laplacean definition of probability was widely accepted. 
dae he к that probability problems could be given unique solutions 
dee Ыг finding the proper framework of “equally likely" descriptions. 
7 наспай this point of view, examples were constructed that admitted 
а equally plausible, but incompatible, solutions. We now oie 
an example similar to one given by Joseph Bertrand in his treatise на 
des probabilités, Paris, 1889, р. 4, and later called by Poincaré, у : 
paradox.” It was pointed out to the author by one of his students pa 
this example should serve as a warning to all persons who adopt practica 


*2 


x27 ху+ іа f 127 xy + ja 


g 
2 


Fig. 7A. 


Fig. 7B. 


policies on the basis of theoretical solutions, without first establishing that 
the assumptions underlying the solutions are in accord with the experi- 
mentally observed facts. 


> Example 7E, Bertrand's paradox, Let a ch 


in a circle of radius v, What is the probabilit 


ord be chosen randomly 
chord will be less than the radius r? 


y that the length X of the 


let Y, 


and Y, be points chosen randomly in 
terval 


0 to r, respectively. Draw a chord by 


SEC. 7 RANDOM SAMPLES 303 


letting Y, be the angle made by the chord with a fixed reference line and 
by letting У, be the (perpendicular) distance of the mid-point of the chord 
from the center of the circle (see Fig. 7C). A second method of randomly 
choosing a chord is as follows: let 2; and 7„ be points chosen randomly 
in the interval 0 to 27 and the interval 0 to 7/2, respectively. Draw a chord 
by letting Z, and Z, be the angles indicated in Fig. 7D. The reader may be 
able to think of other methods of choosing points to determine a chord. 
Six different solutions of Bertrand's paradox are given in Czuber, 
Wahrscheinlichkeitsrechnung, B. G. Teubner, Leipzig, 1908, pp. 106-109. 


Fig. 7D. 


Fig. 7C. 
The length X of the chord may be expressed in terms of the random 
Variables Y, Y, Zp and Zs: 

(7.9) Y-2VB—Yà3 or Х==2гсо57,. 

Consequently PLY < г] = РГУ > rà V3], or PIY < г] = P[cos Z, < 3] = 

2 > (7/3)]. In both cases the required probability is equal to the ratio 

9f the areas of the cross-hatched regions in Figs. 7C and 7D to the areas 

9f the Corresponding shaded regions. The first solution yields the answer 


“(1 — 532 | oz 
(7.10) peen- 2 = 3/ ) en V3 = 0.134, 
ATI 


w i j 
hereas the second solution yields the answer 


12) —2|32- l. 
(7.11) P[X — г] d Бі ARAS 


304 RANDOM VARIABLES CH. 7 


for the probability that the length of the chord chosen will be less than the 
i circle. 
Б be noted that random experiments could be performed in such 
a way that either (7.10) or (7.11) would be the correct Probability in the 
sense of the frequency definition of probability. If a disk of diameter 2r 
were cut out of cardboard and thrown at random on a table ruled with 
parallel lines a distance 2r apart, then one and only one of these lines 
would cross the disk. All distances from the cen 
likely, and (7.10) would represent the probability th 
the line across the disk would have a length less than 
if the disk were held by a pivot through a point on its edge, which point 
lay upon a certain straight line, and spun randomly about this point, then 


(7.11) would represent the probability that the chord drawn by the line 
across the disk would have a length less than r. 4 


The following example has many important extensions and practical 
applications. 


ter would be equally 
at the chord drawn by 
r. Onthe other hand, 


> Example 7F. The 
road, L miles long, 
Show that the proba 
miles apart is equal 


probability of an uncrowded road. Along a straight 
are n distinguishable persons, distributed at random. 


bility that no two persons will be less than a distance d 
to, for d such that (n — Da € L, 


(7.12) (1 —(n— HE 


Solution: For j = 1,2,. -,m let Y, denote the position of the jth 
Person. We assume that ХЫ... аге independent random 
variables, each uniformly distributed over the interval 0 to L. Their joint 
probability density function is then given by 


(7.13) fy, x, 


1 
meh) mr, 0< 4%, m,+++,2,<L 


=0 otherwise. 


permutation, or ordered n-tuple chosen without replace- 
sin) of the integers 1 to n,d 


efine 
(14) X à:-- 


Thus (б, iz... i) is a zone of points in n-dimensional Euclidean space. 
There are n! such zones that are mutually exclusive. The union of all 
zones does not include all the points in n-dimensional space, since an 


n-tuple (z,, 25, .. . , æ) that contains two equa] components does not lie 
in any zone. However, we are able to ignor 


€ the set of points not included 


Next, for each 
ment, (5, is... 


СЕ > Tp): Mis ae 1, ). 


SEC. 7 RANDOM SAMPLES 305 


in any of the zones, since this set has probability zero under a continuous 
probability law. Now the event В that no two persons are less than a 
distance d apart may be represented as the set of n-tuples (а, 25, . .. , Cp) 
for which the distance |x; — 2; between any two components is greater 
than d. To find the probability of B, we must first find the probability of 
the intersection of B and each zone H(i, io, ...,1,) We may represent 
this intersection as follows: 
(7.15) Blü i, Li) m (Quom): Omm <L — (n — Dd, 
а, d<r, <L- (п — 2)d, 
а, d<a, «L—(n—3d4:,m ,+4<ж < 1). 
Consequently, 
(7.16) Px xy, Bl, ints D) 
L-(a-1)d — L-(-24d L 
= | dr; | da. ee | dr; 
IL" 1 2 n 
Ü zn +d Tini +d 
1-00-00 1—(%-®) 1-d' 1 
= | du, dig? [ du, 4 I dup 
0 utd Unig td’ Ug, a td’ 


in which we have made the change of variables ш = 2; /[,...,и, = 
X |L, and have set d' = d/L. The last written integral is readily evaluated 


and is seen to be equal to, for k = 2,...,п — 1, 


l-(n—-1d'  i1—(n-2)d 1—(-4)d* 
(7.17) J du, f dig *** йи, 
0 utd’ uy te 
1 
—— (1 — (n — а’ — u)". 
x (п = Bi ) n) 


The Probability of B is equal to the product of z! and the probability of 
the Intersection of В and any zone /(5, is - «+» În). The proof of (7.12) is 
now complete. 


In a similar manner one may solve the following problem. 


> Example 7С. Packing cylinders randomly on a rod. Consider a hori- 

zontal rod of length L on which п equal cylinders, each of length c, are 

distributed at random. The probability that no two cylinders will be less 
ап d apart is equal to, for d such that L > nc + (n — 1)d, 


cus q-e)" < 


L—ne 


306 RANDOM VARIABLES CH. 7 


The foregoing considerations, together with (6.2) of Chapter 2, establish 
c 
extremely useful result. | | 
"rhe Random Division of an Interval or a Circle. Suppose that a straight 
line of length L is divided into л subintervals by (n — 1) points chosen at 
random on the line or that a circle of circumference L is divided into n 
subintervals by п points chosen at random on the circle. 


ability Р, that exactly k of the subintervals will exceed 
given by 


n [Lid] mf” as k ( -*) n=l 

(119) dye () AC» ae "=J = 
It may clarify the meaning of (7.19) to express it in terms of random 
variables. Let X, Х,..., Х,_ be the coordinates of the n — І points 
chosen randomly on the line (a similar discussion may be given for the 
circle) Then Xj, Жаза у Жу are independent random variables, cach 
uniformly distributed on the interval 0 to L. Define new random variables 
Esos Yat X is equal to the minimum of Maa Аргын ds 
Y, is equal to the second smallest number among Ху, Xp,..., X, ,;and, 
50 on, up to Y, ү, which is equal to the maximum of Рр РРР. ET 
The random variables Yy Yo,..., Y, , thus constitute a reordering of 
the random variables Х.Х, ...; Э. rosis according to increasing magnitude. 
For this reason, the random variables Yis Yy..., Y, , are called the 


order statistics Corresponding to X,, Х„... ‚ X44. The random variable 
Y, fork = 15:25. 


>t ++ A — 1, is usually spoken of as the kth smallest value 
among Ху, ХУ, Ж 


з= 
The lengths Wy, №)... 
the (n — 1) randomly chosen р 


020 W,— y. w, 


Then the prob- 
d in length is 


n Of the n successive subintervals into which 
Oints divide the line may now be expressed: 


У Ny, wo yy Ўра, 


W, = L= Your 
The Probability P, is the Probability that exactly k of the n events 
UV, > d), (Ww, > d), [и > d] will occur, To prove (7.19), one needs 
only to verify that fo 


у T any integer j the probabilit that j specified 
subintervals will exceed d in length is equal to Г "SN 


jd n—l ы . 
(7.21) (1 == г) 07-4 
References to the large variety of problems to which (7.19) is applicable 
may be found in two papers: J. O. Irwin, “A Unified Derivation of Some 


Well-known Frequency Distributions of Interest in Biometry and Statis- 
tics,” Journal of the Royal Statistical Societ 


e Y A, Vol. 118 (1955), pp. 389- 
398, and L. Takacs, “Оп а general Probability theorem and its application 


SEC. 7 RANDOM SAMPLES 307 


in the theory of stochastic processes," Proceedings of the Cambridge 
Philosophical Society, Vol. 54 (1958), pp. 219-224. 


7.1. 


73. 


7л. 


72. 


THEORETICAL EXERCISES 


Buffon's Needle Problem. A smooth table is ruled with equidistant parallel 
lines at distance D apart. A needle of length L — D is dropped on the 
table. Show that the probability that it will cross one of the lines is 
(21)/(= р). For an account of some experiments made in connection 
With the Buffon Needle Problem see J. V. Uspensky, Introduction to 
Mathematical Probability, McGraw-Hill, New York, 1937, pp. 112-113. 


A straight line of unit length is divided into п subintervals by п — 1 points 
Chosen at random. For r = 1,2,...,7 — 1, show that the probability 
that none of r specified subintervals will be less than d in length is equal to 


(7.22) (1 — rd)" 


Hence, using (6.3) of Chapter 2, conclude that the probability that at 
least 1 of the п subintervals will exceed d in length is equal to 


(7.23) яп — dy -()a — 2d) 
Ten cofia — rd +, 


the series continuing as long as the terms (1 — rd, = 1,2,... аге 
positive, 


EXERCISES 


А young man and a young lady plan to meet between 5 and 6 P.M., each 
agreeing not to wait more than 10 minutes for the other. Find the pro- 


bability that they will meet if they arrive independently at random times 
between 5 and 6 P.M. 


Consider light bulbs produced by a machine for which it is known that the 
life Yin hours of a light bulb produced by the machine is a random variable 


With probability density function 
ї 
„(жу = —— e- (111000) for « >0 
Ix) = 19gg* 
=0 otherwise. 


Consider a box containing 100 such bulbs, selected randomly from the 


Output of the machine. 
(i) What is the probability that a bulb selected randomly from the box will 


have a lifetime greater than 1020 hours? 


308 RANDOM VARIABLES CH. 7 


i ili andomly 
ii) What is the probability that a sample of 5 bulbs selected ran "nb 
in the box wil contain (a) at least 1 bulb, (b) 4 or more bulbs with a 
lifetime greater than 1020 hours? 


iii) Find approximately the probability that the box will contain between 
yen 40 dme inclusive, with a lifetime greater than 1020 hours. 


7.3. Six soldiers take up random positions on a road 2 miles long. What is 
the probability that the distance between any two soldiers will be more 
than (i) 5, (ii) 4, (iii) 1 of a mile? 

7.4. Another version of Bertrand": 
a given circle, What is the 
greater than the side of th 


5 paradox. Leta chord be drawn at random in 

probability that the length of the chord will be 

€ equilateral triangle inscribed in that circle? 

7.5. А point is chosen randomly on each of 2 a 
the probability that the area of the triangle formed by the sides of the 
Square and the line joining the 2 points will be (i) less than $ of the area of 
the square, (ii) greater than 4 of the area of the square. 

7.6. Three points are chosen randomly on the circumference of a circle, What 
is the probability that there will be a semicircle in which all will lie? 

7.7. А line is divided into 3 subinterv: 
line, Find the probability that 
made to form the sides of a tri 


78. Find the probability that the roots of the equation 22 + 2X0 + Y, = 0 
will be real if (i) X; and X» are randomly chosen between 0 and 1, (ii) AX, 
15 randomly chosen betwe 


en 0 and 1, and X, is randomly chosen between 
=l and 1, 


djacent sides of a square. Find 


als by choosing 2 points randomly on the 


the 3-line segments thus formed could be 
jangle, 


7.9. In the interval 0 to 1 


Will be to the left of th 


7.0. A straight line 


8. THE PROBABILITY LAW OF A FUNCTION 
OF А RANDOM VARIABLE 


In this section we develop formulas for the 
variable Y, which 


arises аза function of an 
that for some Borel function g(-) 


(8.1) 


probability law of a random 
other random variable Х, so 


SEC. 8 PROBABILITY LAW OF A FUNCTION OF A RANDOM VARIABLE 309 


function f,.() or the probability mass function py() in cases in which 
these functions exist. From (2.2) we obtain the following formula for the 
value Fj-(y) at the real number y of the distribution function Fy(): 


(82) Fry) = Рх: ge) < if Y= 80). 


Of great importance is the special case of a linear function g(x) = 
ax + b, in which a and b are given real numbers so that a > 0 and 
—co < Б < оо. The distribution function of the random variable 
Y = aX + b is given by 

1 


—b = Б 
(83) ку = PlaX +b< y= | x< = = Fx ( = ) à 


If X is continuous, so is Y = aX + b, with a probability density function 
for any real number у given by 


(8.4) fix) = of x (* z +) | 


If Y is discrete, so is Y = aX + b, with a probability mass function for 
апу real number y given by 


—b 
(8.5) Pax i) =рх Е a ) E 


_ Next, let us consider g(x) = 2°. Then Y = X°. Fory c, (e: aS y) 
15 the empty set of real numbers. Consequently, 


(8.6) Fy:(y) = 0 for y < 0. 
Fory 0 
л) ғу) = PIX? < y) = PI- VY < XS Vil 


= FA) — Б-У) + Px(- V9). 


One Sees from (8.7) that if X possesses а probability density function 
x0) then the distribution function Еу) of Х may be expressed as ап 
ntegral; this is the necessary and sufficient condition that X? possess a 
Probability density function fy:(). To evaluate the value of fy«(y) at a 
cal number у, we differentiate (8.7) and (8.6) with respect to у. We obtain 
(8 o 5 i= foy 
ft) = UC) + (У Ol 91У 


=0 for y < 0. 


310 RANDOM VARIABLES CH. 7 
It may help the reader to recall the so-called chain rule for differentiation 
ni din of a function, required to obtain (8.8), if we point out that 


d im £x(9 + Л) — Fy(Vy) 
(9) ТРО) = lim EGER = а 
Ех(Му + Л) — FV). Vy + у 
= lim — lim —— 7 — "9 

һ—0 Vy 4 h— Vy һ—0 h 


d 
= Fy(Vy) à; 


If Y is discrete, it then follows from (8.7) that Y? is discrete, since the 
distributi i 


(8.10) PG) = px(Vy) p Vj) for y 20 
=0 fory <0, 

> Example 8A. The random sine wave. Let 

(8.11) 


X = A sin 0, 
in which the amplitud 
a random variable uni 
distribution function 


e A is a known Positive consta 
formly distributed on the inter 
Fx(-) for lt] < A is given by 

Fy) = PIA sin 0 = 


nt and the phase 0 is 
val —7/2 to z[2. The 


2] = P[sin 0 S2/A] 


= PIO < sin? (spy = F,(sin- =) 


ea 


the Probability density function is given by 


(8.12) љо) = (1 = 6) *. lal x A 


=0 otherwise. 


Consequently, 


If a projectile is fired at an angle z to the earth wi 
v, then the point at Which the Projectile retur 
R from the point at which ji ч given by the equation 
R = (08/9) sin 2%, in Which g is the Bravitationa] Constant, equal to 


SEC. 8 PROBABILITY LAW OF A FUNCTION OF A RANDOM VARIABLE 311 


980 cm/sec? or 32.2 ft/sec?. If the firing angle is a random variable with 
а known probability law, then the range A of the projectile is also a random 
Variable with a known probability law. 

А random variable similar to the one given in (8.11) was encountered 
in the discussion of Bertrand's paradox in section 7; namely, the random 
Variable Y = 2r cos Z, in which Z is uniformly distributed over the 
interval 0 to z[2. 4 


» Example 8B. The positive part of a random variable. Given any real 
number x, we define the symbols x* and x7 as follows: 


(8.13) at—x: if~>0, a-0 if >0 


=0 ifa <0. = –1 if z < 0. 


Then а= = æ+ — a~ and [| = 2+ + a7. Given a random variable X, let 
Y= X+, We call Y the positive part of X. The distribution function 
of the positive part of Y is given by 


(8.14) Fry) =0 ify <0 
= Еу(0) ify=0 
= Fx) if y — 0. 


Thus, if y is normally distributed with parameters m = 0 and e = І, 


ns Fa(g)-0 ify<0 
= Ф(0) =} ify=0 
= My) if y > 0. 


The Positive part X+ of a normally, distributed random variable is neither 
Continuous nor discrete but has a distribution function of mixed type. <q 


The Calculus of Probability Density Functions. Let X be a continuous 
random variable, and let Y ы g(X). Unless some conditions are imposed 
On the function gC), it is not necessarily true that Y is continuous. For 
V ample, Y — Y* is not continuous if X has a positive probability of 

Sing Negative. We now state some conditions on the function g(-) under 
Which £(X) is a continuous random variable if X is a continuous random 
variable, At the same time, we give formulas for the probability density 

unction of g(X) in terms of the probability density function of X and the 

erivatives of g(-). . о . 

- € first consider the case in which the function g() is differentiable H 
for Y Teal number x and, further, either g'(2) > 0 for all х or g'(z) < 
all а. We may then prove the following facts (see R. Courant, 


312 RANDOM VARIABLES CH. 7 


Differential and Integral Calculus, Interscience, New York, 1937, pp. 144- 


145): (i) as x goes from —oco to co, g(x) is either monotone increasing or 
monotone decreasing; (ii) the limits 


«= lim g(x), В' = lim g(z) 


Z— o 


8.16 кн 
<9 « = min («'. f^), В = тах («', В) 
exist (although they тау be infinite); (iii) for every value of y such that 
а < y < В there exists exactly one value of x such that y = g(x) [this 
value of x is denoted by g“(y)]; (iv) the inverse function ж = g-t(y) is 
differentiable and its derivative is given by 

dv d 


d 
(8.17) a di (у) = Z g(x) 


For example, let &(®) = tan?! z, Then g(x) = M(1 + 2?) is positive for 
allz. Here y = —т[2 and В = 7/2. The inverse function is tan y, defined 
for |y| < 7/2. The derivative of the inverse function is given by da/dy = 
sec? y. One sees that (dy/dx) = 1 + (tan y)? is equal to ахјау, as 
asserted by (8.17). We may now state the following theorem: 

If y = g(x) is differentiable for all v, and either g'(x) > 0 for all x or 
g'(«) <0 for all x, and if X is a continuous random variable, then Y = g(x) 
is a continuous random variable with probability density function given by 


а 
z-—g-y) и ША 


(8.18) SrO) = fxlg?(y)] 


d 
we 9 ifa<y<f 
=0 otherwise, 
in which o, and B are defined by (8.16). 
To illustrate the use of (8.18), let us note the formula: if Y is acontinuous 
random variable, then 


(8.19) Лат) = fx(tan y) sec? y 


=0 otherwise, 


To prove (8.18), we distinguish two cases 
y = g(x) is monot 


ibution function of Yfora<y<f 
may be written 


($20 — Fy) = Pg) c y = 
In the second case, for g «y « f, 


(820) Fr) = Plg(x) c у= 


PIX < g7) = Fylg). 


PIX > gy] = 1 — Fx[g (y). 


SEC. 8 PROBABILITY LAW OF A FUNCTION OF A RANDOM VARIABLE 313 


If (8.20) is differentiated with respect to y, (8.18) is obtained. We leave it 
to the reader to consider the case in which y < « or y > f. 

One may extend (8.18) to the case in which the derivative g'(x) is 
continuous and vanishes at only a finite number of points. We leave the 
Proof of the following assertion to the reader. 

Let y = g(x) be differentiable for all v and assume that the derivative 
8 (*) is continuous and nonzero at all but a finite number of values of a. 
Then, to every real number y, (i) there is a positive integer m(y) and points 
%(Y), as (y), . . . , a,,(y) such that, for k = 1,2... . , m), 


(821) gb) - v. 81001 0. 


Or (ii) there is no value of x such that g(x) = y and g'(x) #0; in this case 

We write m(y) = 0. If X is a continuous random variable, then Y — g(X) 

i а continuous random variable with a probability density function given 
y 


62) — f) =" Fates’ гт) >0 
=0 if my) =0. 


We obtain as an immediate consequence of (8.22): if-Y is a continuous 
random variable, then 


(8.23) fx” = fx) + fx) fory > 0 
=0 гу < 0; 

620 — fim) =x) fory 
=0 for y < 0. 


Equations (8.23) and (8.24) may also be obtained directly, by using the 
Same technique with which (8.8) was derived. 
The Probability Integral Transformation. It is а somewhat surprising 
act, of great usefulness both in theory and in practice, that to obtain а 
random sample of a random variable X it suffices to obtain a random 
sample of a random variable U, which is uniformly distributed over the 
terval 0 to 1. This follows from the fact that the distribution function 
at) of the random variable X is a nondecreasing function. Consequently, 
1 inverse function F х7) may be defined for values of y between 0 and 1: 
X" (y) is equal to the smallest value of x satisfying the condition that 
xG) my, 


> Example 8С, If y is normally distributed with parameters m and c, 


dun Fx(2) = ФЦ — m)/o] and Еу) = m + o9), in which oy) 
Notes the value of x satisfying the equation DEB) =y. 


314 RANDOM VARIABLES cH. 7 


In terms of the inverse function Еу) to the distribution function 
Fx(-) of the random variable Y, we may state the following theorem, the 
wreck of which we leave as an exercise for the reader. 


THEOREM 8A. Let Uj, Up,..., „ be independent random variables, 


each uniformly distributed over the interval 0 to 1. The random variables 
defined by 


(825) X, = Руси), Xa = F(U), >, X, = FxU,) 


are then a random sample of the random variable ра Conversely, if 
Xy Xs, ..., X, area random sample of the random variable X and if the 
distribution function Fx(-) is continuous, then the random variables 
(8.26) U = Fx(X), U, = Fx(X9, e, U, = Fx(X,) 
аге а random sample of 
uniformly distributed on the interval 0 to 1. 

The transformation of a random variable Y j 
random variable U — FxX(X)iscalled the probab 
It plays an important role in the modern theor 
distribution functions; see T. W. Anderson a 
theory of certain goo 


dness of fit criteria Баз 
Annals of Mathematical Statistics, Vol. 23 (1 


the random variable U — Fx(X), which is 


nto a uniformly distributed 
ility integral transformation. 
y of goodness-of-fit tests for 
nd D. Darling, "Asymptotic 
ed on stochastic processes,’ 
952), pp. 195-212. 


EXERCISES 


81. Let x have a z 
Y = VXin hasa z 


8.2. The temperature Т » recorded in degrees Fahrenheit, 
obeys a norm ith mean 98.6 and variance 2. The 
temperature 0 а і 


: ntigrade is related to T by 0= 
»(T — 32). Describe the probability law of 0, 


8.3. The magnitude v of 


distribution with P 


arameters n and о. Show that 
distribution with pa 


rameters л and c, 


the velocity of 
T is a random variable, 


probability densit 
Describe in words the Probability law of £, 
84. Ah 


stock, the profit Y fr 
CX, 10). Describe t 


SEC. 8 PROBABILITY LAW OF A FUNCTION OF A RANDOM VARIABLE 315 


8.5. Find the probability density function of X = cos 0, in which 0 is uniformly 
distributed on —7 to =. 

8.6. Findthe probability density function of the random variable Y = A sin of, 
in which A and ware known constants and: is a random variable uniformly 
distributed on the interval —T to T, in which (i) T is a constant such that 
OzoTrz 7/2, (ii) T = n(2z/w) for some integer n = РА 


87. Find the probability density function of Y — eX, in which Y is normally 
distributed with parameters n and с. The random variable Y is said to 
havea lognormal distribution with parameters тт and о. (The importance 
and usefulness of the lognormal distribution is discussed by J. Aitchison 
and J. A, C. Brown, The Lognormal Distribution, Cambridge University 


Press, 1957.) 


In exercises 8.8 to 8.11 let Y be uniformly distributed on (a) the interval 0 to 1, 
(b) the interval —1 to 1. Find and sketch the probability density function of the 


functions given. 

$8. (i) x? (ii) УТА]. 

89. (i) eX, (ii) —log, | Y]. 
8.10. (i) cos =, (ii) tan 7X. 
841. (i) 2y + 1, (ii) 2X? +1. 


ae exercises 8.12 to 8.15 let X be no t 
given, = l. Find and sketch the probability density 


912. (i) y? (ii) ех, 
8.13, (i) XPS, (i) |y s. 
814. (i) 2 4.1, 2x2 +1. 


8.15, (i) sin =, (ii) tan? X. 
8.16, 


rmally distributed with parameters т = 0 
functions of the functions 


At time ¢ = 0, a particle is located at the point v = 0 on an x-axis. Ata 

time 7 randomly selected from the interval 0 to 1, the particle is suddenly 

Blven a velocity v in the positive x-direction. For any time г > 0 let X(7) 

denote the position of the particle at time /. Then X(t) =0, if 1 <7, 

and X(r) = v(t — Т), ift > T. Find and sketch the distribution function 

Of the random variable X(r) for any given time / > 0. 

Emitt at the amplitude X(t)ata time г of the signal 
à)u *d by a certain random signal generator IS known to be a random variable 
ага formly distributed over the interval —1 to 1, (b) normally distributed with 

Meters m = 0 and a > 0, (c) Rayleigh distributed with parameter о. 


I 5 
П exercises 8,17 to 8.20 suppose th 


a squaring circuit; the output Y() 
med to be given by Y(r) = X*(1). 
(г) for any time 


817 | 
* The waveform Y(t) is passed through 
9f the squaring circuit at time / is assu 


Find and sketch the probability density function of Y 
10 


316 RANDOM VARIABLES cH. 7 


Я ; zs К iut 
. The waveform X(r) is passed through a rectifier, giving as its outp 
рз To = | X)|. Describe the probability law of Y(r) for any time г > 0. 


8.19. The waveform X(t) is passed through a half-wave rectifier, giving as its 


output Y(r) = X*(t), the positive part of X(r). Describe the probability 
law of Y(t) for any t > 0. 


8.20. The waveform X(r) is passed through a Clipper, giving as its output 
Y() = g[X(0)), where g(x) = 1 or 0, depending on whether = > 0 or 
2 <0, Find and sketch the probability mass function of Y(r) for any 
1> 0. 


8.21. Prove that the function 


given in (8.12) is a probability density function. 
Does the fact that the fu 


nction is unbounded cause any difficulty? 


9. THE PROBABILITY LAW OF A FUNCTION 
OF RANDOM VARIABLES 


In this section we develop formulas for the probability law of a random 
variable Y, which arises as 


aria a function Y = g(X,, Xy,..., Xp) of n jointly 
distributed random variable XY, Х,,..., X,. АП of the formulas 


developed in this section аге consequences of the following basic theorem. 


THEOREM ЭА. Lets X X. ous X, be n jointly distributed random 
Variables, with joint probability law Py. xrlh Let Y= gU 
Aus A). Then, for any real number y v 
09.) Fy()-P[Y« y] 

=P 


Fy dee xl, Tp ctt gy): ga, E»ttt, ®„) = у). 


<o Xa) << y, which is the 
js Pod ee 
lie in the set of n-tuples {(ш\,... аў om variables Ху, Xo, 


En Lii m) < y). 
ly interested in 


Xp X, m the case in which the random variables 
» 4n... X, are jointly Continuous, with joint probability density 
ЕА "**355.., .). Then (9.1) may be Written 
(92) ру) = 
y (y) T PN ON Tost my) 
2s, 


LOL UC EE »Z4)Xxy) 

di, аху: dv, 
To begin with, let us obtai i 
"Y , With, ain the probabilit of fO 
jointly continuous random variables y ice aaa 


X, and Xs with а joint probability 


SEC. 9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 317 
density function fy.  (. ,). Let Y= X, + X». Then 
(9.3) Fy(y) = PIX, + Xo € y] = Py, x {@, az: c, + t X y]] 


= Г, х, 29) d, de, 


Alti Ta): T 48 Ev) 


© yay 
=|` [аал tme ct 


© у В 
aj a, { drs fy x (5s t2 — 28). 
=o J-a 


By differentiation of the last equation in (9.3), we obtain the formula for 
the Probability density function of X, + Xs: for any real number y 


(9.4) Jos x) =f" dz, f y, х, у— у) =|" ал, x — To, to). 


If the random variables X, and X, are independent, then for any real 
number y 


9.5) Л, х0) =f" du Ту (ЈУ, (У — ® = [ало = ®)/х,(Ф®). 


The mathematical operation involved in (9.5) arises in many parts of 
mathematics. Consequently, it has been given a name. Consider three 
functions AQ), fl), and RC), which are such that for every real number у 


(9.6) "T [. fled fiy = 2) 4; 


the function ДС) is then said to be the convolution of the functions fi(-) 
and fi, and in symbols we write RO = AORO. 

n terms of the notion of convolution, we may express (9.5) as follows. 

he Probability density function fy,+x,0) of the sum of two independent 

Continuous random variables is the convolution of the probability density 
Mering Fx, C) and fy C) of the random variables. — 

. „пе can prove similarly that if the random variables X, and X, are 

Jointly discrete then the probability mass function of their sum, X, + X», 


“г any real number y is given by 


(9.7) PxQx y — 2) 


over all z such that 

Dx, x, UU > 

v Px, X — $2) 
over all z such that 

Dy,,x,U—7 27 


Px,+x,Y) = 


318 RANDOM VARIABLES cH. 7 


for 
d (9.4) we may prove the formulas 
the same way that we prove 1 | 
кыре км density function of the difference, product, and quotient 
of two jointly continuous random variables: 


(9.8  fx-y() -[ ауф + x, 2) =| „Лаб х — y). 


OIO Sart) = |. dele fts a). 


We next consider the function of two variables given by g(x, з) = 
Vx? + xè and obtain the probability law of Y = VX, + ¥,2. Suppose 
one is taking a walk in a plane; starting at the origin, one takes a step of 
magnitude X, in one direction and then in a perpendicular direction one 
takes a step of magnitude X,. One will then be at a distance Y from the 
origin given by Y — Ух? + Ж. Similarly, Suppose one is shooting at à 
target; let X; and Y, denote the coordinates of the shot, taken along 
perpendicular axes, the center of which is the target. Then Y = V Y? + X? 
is the distance from the target to the point hit by the shot. 


The distribution function of Y = У X? + X? clearly satisfies Fy(y) =0 
for y < 0, and fory > 0 


(9.11) Fy(y) = ГА x, X, T2) du, dy, 
(ал) 2242 +z? cy?) 
We express the double integral in (9.11) b 


y means of polar coordinates. 
We have, letting ту = r cos 0, x, = r sin б, 


2л у 

Fy(y) =| af r dr fx x, (r cos 0, r sin б). 
0 0 

If X, and Y, are jointly continuous 


1. » then У is continuous, with а 
probability density function obtained by differentiating (9.12) with respect 
to y. Consequently, 


033. NERO =| an, 
0 


= 0 


(9.12) 


Xx соз 0, узіп б) fory 0 


for y < 0, 
2s 
(9.14) fei xy) = 1 d fx. x (Vy cos 0, Vysin0) fory >0 


=0 for y <0, 
where (9.14) follows from (9.13) and (8.8). 


SEC. 9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 319 


The formulas given in this section provide tools for the solution of a 
great many problems of theoretical and applied probability theory, as 
examples 9A to 9F indicate. In particular, the important problem of 
finding the probability distribution of the sum of two independent random 
variables can be treated by using (9.5) and (9.7). One may prove results 
such as the following: 


THEOREM 9B. Let Y, and X, be independent random variables. 


(i) If Y, is normally distributed with parameters 7 and c, and X, is 
normally distributed with parameters 72 and оь, then X, + X» is normally 
distributed with parameters n = M + m, and с = Мо + os. 

(ii) If X, obeys a binomial probability law with parameters n, and p 
and X, obeys a binomial probability law with parameters Ny and p, then 
X, + X, obeys a binomial probability law with parameters m + п» and p. 

iii) If X, is Poisson distributed with parameter J, and X, is Poisson 
distributed with parameter dy, then X, + X, is Poisson distributed with 
parameter 2 = 2, + Ao. 

(iv) If X, obeys a Cauchy probability law with parameters а, and bı 
апа X, obeys a Cauchy probability law with parameters d» and б», then 
Xx x, obeys a Cauchy probability law with parameters d; + d» and 
by + by. 

(V) If Y, obeys a gamma probability law with parameters r, and д and 
X, Obeys a gamma probability law with parameters го and A, then X; + X» 
Obeys а gamma probability law with parameters г + and 4. 

B is given in example 9A. The other 
asexercises. A proof of theorem 
tion 4 of Chapter 9. 


A proof of part (i) of theorem 9 
Sp of theorem 9B are left to the reader 
from another point of view is given in sec 


> Example 9A. Let Y, and X; be independent random variables; X; 

15 normally distributed with parameters m, and о, whereas X, is normally 
IStributed with parameters т and о». Show that their sum X, + Xs is 

Normally distributed, with parameters m and c satisfying the relations 


(9.15) 


2 2 2 
m = т + т», о = о? + 02. 


Solution: By (9.5), 


© йа ° ER S 2 
да Гане) ГА 


т01052— c оү 


320 RANDOM VARIABLES CH. 7 


By (6.9) of Chapter 4, it follows that 


(9.16) Sora A exp [ TE - а | 


«Бә ay]. 


то? + (y — т)? 
Ge + в? 


where 
жа 
01705" 


* 
m = 
oP + o? 


g*? 


, 


However, the expression in braces in equation (9.16) is equal to 1. There- 
fore, it follows that Х + X, is normally distributed with parameters m 
and c, given by (9.15). 4 


p» Example 9B. The assembly of parts. It is often the case that a dimension 
of an assembled article is the sum of the dimensions of several parts. 
An electrical resistance may be the sum of several electrical resistances. 
The weight or thickness of the article may be the sum of the weights or 
thicknesses of individual parts. The probability law of the individual 
dimensions may be known; what is of interest is the probability law of 
the dimension of the assembled article. An answer to this question may be 
obtained from (9.5) and (9.7) if the individual dimensions are independent 
random variables. For example, let us consider two 10-ohm resistors 
assembled in series. Suppose that, in fact, the resistances of the resistors 
are independent random variables, each obeying a normal probability 
law with mean 10 ohms and standard deviation 0.5 ohms. The unit, 
consisting of the two resistors assembled in series, has resistance equal to 
the sum of the individual resistances; therefore, the resistance of the unit 
obeys a normal probability law with mean 20 ohms and standard deviation 
{(0.5)? + (0.5)2)' = 0.707 ohms. Now suppose one wishes to measure 
the resistance of the unit, using an ohmmeter whose error of measurement 
is a random variable obeying a normal probability law with mean 0 and 
standard deviation 0.5 ohms. The measured resistance of the unit is à 
random variable obeying a normal probability law with mean 20 ohms 
and standard deviation V (0.707)? + (0.5): = 0.866 ohms. 4 


> Example 9c. Let X, and X, be independent random variables, each 
normally distributed with parameters m = 0 and o > 0. Then 


| Doy 
fx, x (9 cos 0, y sin 0) = e Ns 
2то? ` 


SEC. 9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 321 


Consequently, for y > 0 


(0)? 
917) Ми) = ҖЕ”. 
З E: 
(9.18 D ck 
8) Јхе+хаб@) = 258 * 


In words, 4/ X, XS has a Rayleigh distribution with parameter с, 
Whereas X? + X;? has a 7? distribution with parameters n = 2 and c. <q 
> Example 9D. The probability distribution of the envelope of narrow- 
band noise. A family of random variables X(r) defined for > 0, is 
Said to represent a narrow-band noise voltage [see S. O. Rice, “Mathe- 
matical Analysis of Random Noise,” Bell System. Tech. Jour., Vol. 24 
(1945), p. 81] if X(t) is represented in the form 
(9.19) X(t) = X(t) cos wt + X,(t) sin wt, 
in which c is a known fi requency, whereas X,(t) and X,(t) are independent 
normally distributed random variables with means Oand equal variances o°. 
The envelope of X(t) is then defined as 
(9.20) R(r) = (X20) + XX. 
In view of example 9C, it is seen that the envelope R(t) has a Rayleigh 
distribution with parameter « = о. 
D> Example 9E. Let U and V be independent random variables, such 
that U is normally distributed with mean 0 and variance о? and V has a x 
distribution with parameters п and c. Show that the quotient T — U/V 
has Student's distribution with parameter n. 

Solution: By (9.10), the probability density function of T for any real 
number is given by 


Sola) = [^as s foren 


ws Jg, umm [-:0)] 
= К [erop] (4) | ЕР 5 19 


_ д Зу? 
T(n[2)V2z 
By making the change of variable u — æv (y? + n)/o, it follows that 


Where 


2 (iy ute at 
Jr) = Ky + aee йи и"е *" 


= Ky? + p eee nep ("+ 1) | 


322 RANDOM VARIABLES cH. 7 


from which .one may immediately deduce that the probability density 
function of Y is given by (4.15) of Chapter 4. <q 


p» Example 9F. Distribution of the range. A ship is shelling a target on 
an enemy shore line, firing n independent shots, all of which may be 
assumed to fall on a straight line and to be distributed according to the 
distribution function F(x) with probability density function f(x). Define 
the range (or span) R of the attack as the interval between the location of 
the extreme shells. Find the probability density function of R. 

Solution: Let Xy, Xy, ... , X, be independent ran 
senting the coordinates locating the position of the 
may be written R — V — U, in which V 
and U — minimum (%, Xo, 
Fy y(u, v) is found as follows. 
that simultaneously X, — v, . 


(9.21) 


dom variables repre- 

п shots. The range R 

= maximum (Ху, X, ..., X,) 

‚+, X,) The joint distribution function 
If u > v, then Fy y(u, v) is the probability 

++, X, E v; consequently, 

Fy (u,v) = [FO] іар v, 


since P[X, < v] = F(v) for k = 1,2,...,n. If ù< v, then Fy y(u, v) is 
the probability that simultaneously X, < v,..., X, X v but not simul- 
taneously u < X, < v... uc X, Xv; consequently, 
(9.22) Fu y(u, v) = [FO))" — [F(9) — FQ)" 
The joint probabilit 
It is given by 


(9.23) forvn=0 иь 


Ги < v. 
y density of U and V is then obtained by differentiation. 


to f(x), is given by 
020 fu = [^ уо) 


=0 fore <0 
= n(n — Df [F(v) — F(v — xy f(y = x) f (v) dv, 
"NN fora > 0. 
The distribution function of R is then given by 


(9.25) Ра) =0 е0 


=n Го = F — а-у, л >0. 


SEC. 9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 323 


Equations (9.24) and (9.25) can be explicitly evaluated only in a few cases, 
Such as that in which each random variable 4, X5, ... , X, is uniformly 
distributed on the interval 0 to 1. Then from (9.24) it follows that 


(9.26) ба) = n(n — Da" — а) п << 1 
=0 elsewhere. 4 


А Geometrical Method for Finding the Probability Law of a Function 
9f Several Random Variables. Consider jointly continuous random 
Variables Ху, Xa, ..., Y, and the random variable Y = g(X, Х,..., Xn). 
Suppose that the joint probability density function of X, Xo,..., X, has 
the Property that it is constant on the surface in n-dimensional space 
Obtained by setting g(,,...,c,) equal to a constant; more precisely, 
a Pom that there is a function of a real variable, denoted by /,(-), such 

hat 


(9.27) PEE AC а) = 700), СЕ m) — V. 


If (9.27) holds and g() is a continuous function, we obtain a simple 
formula for the probability density function fy() of the random variable 
Y = а(х, X, ..., X,); for any real number y 


dV, 
өз fe) = f ©”. 


11 Which V,(y) represents the volume within the surface in n-dimensional 
Space with equation (21, s, . - - 325) 45 in symbols, 
(929) V = es йй с diy. 


[Gr Laut tts Tp)” 5.21) Su} 


We sketch a proof of (9.28). Let B; A) = (s ss... х): y « 
L Ce а.) <y +h}. Then, by the law of the mean for integrals, 


Fru +h) — F= [ae ter 8) ахак, 
p 
= ју, v, LV, + h) — VQ) 
for some point (z,',...,2,) in the set Bly; A). Now, as h tends to 0, 
faro хб, ....4,) tends to f,(y), assuming /,() is a continuous 
unction, and [V (y +h) — V (y)]/h tends to dV,(y)/dy. From these facts, 


one immediately obtains (9.28). 
* illustrate the use of (9.28) by obtaining a basic formula, which 


Beneralizes example 9C. 


324 RANDOM VARIABLES сн. 7 


p> Example 9G. Let X,, Xo, ..., X, be independent random variables, 
each normally distributed with mean 0 and variance l. Let Y — 


V XQ +X? +... + Х,2. Show that 


y"3e- Mir 
(9.30) fy) = 702 — —. гур 0, 
| yle 1 dy 
0 


= 0, for y — 0, 


where "yet dy = 2"-?P?T(n[2). In words, Y has a z distribution 
0 
with parameters п and с = Vn. 

Solution: Define g(z,...,2,) = Va +... + x and LY = 


(27)-"2e- 9 Then (9.27) holds. Now V,(y) is the volume within a 
sphere in n-dimensional space of radius y. Clearly, V,(y) = 0 for y <0, 
and for y > 0 


Vy) = у" m dz, * ** dip, 
Way mnn? eem 2u1) 

so that V, (y) = Ky" for some constant К. Then dV,(y)/dy = nKy"-t. By 

(9.28), fy (y) = 0 for y < 0, and for y > 0 f, (y) = K'y"3e-V for some 

constant K’, To obtain K’, use the normalization condition | Sy) dy 

= l. The proof of (9.30) is complete. E 


> Example 9H. The ener, 
an ideal gas com 
Let vf, o, of di 
the ith particle. 
kinetic energy 


gy of an ideal gas is x? distributed. Consider 
posed of N particles of respective masses 74, m, . . . , My- 


enote the velocity components at a given time instant of 
Assume that the total energy E of the gas is given by its 


N 
E => 5009) + (ve 2n (ys. Е 


Assume that the joint probability density function of the 3N-velocities 
(0) 40) 40) 40) (9) X3) 
(009, 00, 0), 2, 0, ц, 


«S UD, 000, 00) ig Proportional to e-P/^7, 
in which k is Boltzmann’s constant and Ti 


nergy E of the gas is 


y density function may be derived by 
the geometrical method. For x > 0 


/к(®) = Kye? dV (x) 
d: 


Ж 


SEC. 9 PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 325 


for some constant Kj, in which (ж) is the volume within the ellipsoid in 
3N-dimensional space consisting of all 3N-tuples of velocities whose 
kinetic energy E < x. One may show that 
we dV (x) — 
к= ЗУ? = ЖҮЛ I K,8NaGNID-1 
V,(a) = КС, = 22 

for some constant K, in the same way that V,(y) is shown in example 9G 
to be proportional to y". Consequently, for x > 0 


a 12-211 


EN оова foo. 
fg = те 
| активта 
0 


(3N]2) 


In words, E has a 7? distribution with parameters 7 = 3N and 0 = 
кТ[2. «4 


We leave it for the reader to verify the validity of the next example. 


» Example 9I. The joint normal distribution. Consider two jointly 
normally distributed random variables X; and X»; that is, X, and X, have 
à joint probability density function 


(9.31) Sx, xn t) = Чї 


220,0, Vl — p* 


-0,—-1«p«l -© <m «o, 


for some constants o,>0, % 
.) for any two real numbers 


=% < m, < co, in which the function OC , 
х and ay is defined by 


m 1 tı — may _2 e = uy (® = A 
Q(x, а) = 20 — p) [Е о Р 0, z 
ж — т» ‘| 
* ( ГА ) f 


The curve Q(x,, x) = constant is an ellipse. Let Y = Q(X;, Xə). Then 
ч < 


PIY > y] = e~ for y 0. 


THEORETICAL EXERCISES 


probability distributions), which 


Various probability laws (or, equivalently, proe d І 
bability laws of various functions 


on of importance in statistics, arise as the pro 

9f normally distributed random variables. 

9.1. The x? distribution. Show that if X» Xs.... Xn are independent 
random variables, each normally distributed with parameters m = 0 and 
с> 0, and if Z = X? + X +... + Х„?, then Z has a 7? distribution 
with parameters п and c?. 


. 326 
9.2. 


9.3. 


9.4. 


9.5. 


9.6. 


9.7. 


RANDOM VARIABLES CH. 7 


The x distribution. Show that if Xj, X>, ... , Y, are independent random 
variables, each normally distributed with parameters m = 0 and с > 0, 
then 


] 2 


has a 7 distribution with parameters л and о. 


Student's distribution. Show that if Xo Ху,..., X, are (n + 1) indepen- 
dent random variables, each normally distributed with parameters т = 0 
and c > 0, then the random variable 


1 n 
Xo z 2 Ma 
11 


has as its probability law Student's distribution with parameter п (which, 
it should be noted, is independent of о)! 


The F distribution, Show that if Z, and Z; are independent random 
variables, 7? distributed with лу and n, degrees of freedom, respectively, 
then the quotient nyZinZ, obeys the F distribution with parameters л 
and л». Consequently, conclude that if уйу vg May Minar к is 
are (m + n) independent random variables, each normz i 
parameters m = 0 and c > 0, then the random variable 


m 


(Шт) Y x2 
К=1 


Show that if X, has à binomial distribution with parameters n, and p, 
if X, has a binomial distribution with Parameters т, and p, and Y, and 
X are independent, then X, + X, has a binomial distribution with para- 
meters лу + n, and р. 


Show that if X, hasa Poisson distribution with parameter Aj, if X, has a 
Poisson distribution with parameter 25, and X, and Y, are independent, 
then X, + Х is Poisson distributed with parameter л, 3- fs 


Show that if X, and X, are independently and uniformly distributed over 
the interval a to b, then 


(9.32) Дху+х) =0 for Y <2aory > 2b 

== B= 2e 

Cray for2a<y<a+tp 
2b —y 

(b — а)? fora +b <y < 2p, 


SEC. 9 


9.8. 


9.9, 


9.10. 


9.11, 


9.12, 


9.13, 


9.1, 


PROBABILITY LAW OF A FUNCTION OF RANDOM VARIABLES 827 , 


Prove the validity of the assertion made in example 9I. Identify the 
probability law of Y. Find the probability law of Z = 2(1 — p)Y. 


Let Y, and Y, have a joint probability density function given by (9.31). 
Show that the sum Y, + X, is normally distributed, with parameters 


т =m + m and о? = о? + 2роџс + 6,7. 
Let X, and X; have a joint probability density function given by equation 
(9.31), with т = ma = 0. Show that 


no, Vl — р? 
(9.33 T E" 9,03 У P : 
) Sx xl) Wa anad Fo 


If X, and X, are independent, then the quotient X,|X, has a Cauchy 
distribution. 


Use the proof of example 9G to prove that the volume V,(r) of an n- 

dimensional sphere of radius r is given by 

, i 3 quia : 

(9.34) V(r) = | | ee | dv, des ^de, = тит” 
antes 2 


HT, z 


Ij nts 
Prove that the surface area of the sphere is given by dV, (ға. 


nt identically distributed 


Prove that it is impossible for two independe 
to 6, to have the 


random variables, Y, and А, each taking the values 1 
Property that PLY, + X, = К] = vr for k = 2,3... .. 12. atl 
Conclude that it is impossible to weight a pair of dice so that the probability 
of occurrence of every sum from 2 to 12 will be the same. 


Prove that if two independent identically distributed random variables, Ef 
апа X, each taking the values 1 to 6, have the property that their sum will 
Satisfy PLY, + Y, = k] = P[X, + X» = 14 — k] = (k — 1/36 for k = 2. 
Г. 4, 5, 6, and PLX, + X, = 7] = s then PLY, = К] = PIX: = К] = 6 for 
Кж T 1:06: 


EXERCISES 


Suppose that the load on an airplane wing is a random variable X obeying 
à normal probability law with mean 1000 and variance 14,400, whereas 
the load Y that the wing can withstand is a random variable obeying a 
normal probability law with mean 1260 and variance 2500. Assuming that 
X and Y are independent, find the probability that Х<Ү (that the load 
encountered by the wing is less than the load the wing can withstand). 


In exercises 9.2 to 9.4 let X, and X; be independently and uniformly distributed 


9.2, 


9.3, 


€ intervals 0 to 1. 

Find and sketch the probability density function of (i) X, 
4i — My Git) [y — Kal: 

(i) Maximum (Ху, X3), (ii) minimum (А, Xə). 


+ X, (ii) 


328 
9.4. 


RANDOM VARIABLES cH. 7 


(i) XXa Gi) X4/ Xs. 


In exercises 9.5 to 9.7 let X, and X, be independent random variables, each 
normally distributed with parameters m = 0 апіс > 0. 


9.5. 


9.6. 
9.7. 
9.8. 


9.9. 


9.10. 


9.11. 


9.12. 


9.13. 


9.14. 


9.15. 


Find and sketch the probability density function of (i) X; + X, (ii) 
X; — Xz, (iii) | X; — Xal, (iv) (Ху + 9/2, (у) (ОХ, — X92. 


(i) X + X02, (ii) (Х,° + Х,®)/2. 
(i) Ху] Xs, Gi) ХА. 


Let Ху, X», Хз, and X, be independent random variables, each normally 
distributed with parameters т = 0 and o? = 1. Find and sketch the proba- 
bility density functions of (i) X;/ У(Х? + X,°)/2, (ii) 232012 + Xs?), 
Gii) ЗХ + Xè + X42), (iv) (12 + X) + X42). 


Let X;, Xa, and Хз be independent random variables, each exponentially 
distributed with parameter 2 = 1. Find the probability density function 


of (i) Xy + X, + X, (ii) minimum (Xi X», Хз), (iii) maximum (Xj, Xs, Хз), 
(iv) X Xs. 


Find and sketch the probability density function of 0 = tan (YX) if 


X and Y are independent random variables, each normally distributed 
with mean 0 and variance c?, 


The envelope of a narrow-band noise is sampled periodically, the samples 
being sufficiently far apart to assure independence. In this way n indepen- 
dent random variables Y, "D CNN Aig, OTE: Observed, each of which is 
Rayleigh distributed with parameter c. Let Y = maximum (Xj, X» . . -» 


n be the largest value in the sample. Find the probability density function 
of Y. 


Let v = (0,2 + v,? + v.2) be the magnitude of the velocity of a particle 
whose velocity components v,, Vy, v. are independent random variables, 
each normally distributed with mean 0 and variance kT/M; k is Boltz- 
mann's constant, T is the absolute temperature of the medium in which 


the particle is immersed, and M is the mass of the particle. Describe the 
probability law of v. 


Let X, Xa.. -> Xn be independent random variables, uniformly distri- 
buted over the interval 0 to 1. Describe the probability law of —2 log 


(Ху Xs... Xn). Using this result, describe à procedure for forming а 
random sample of a random variable with a 7? distribution with 2n degrees 
of freedom. 


Let X and Y be independent random variables, each exponentially 
distributed with parameter 4, Find the Probability density function of 
Z = Х(Х + Y). 


Show that if Xj, d e., d. BÉ independent identically distributed 
random variables, whose minimum Y — minimum (X; X, ..., Xn) 
obeys an exponential probability law with parameter 2, then each of the 
random variables X,,..., X, obeys an exponential probability law with 


SEC. 10 JOINT PROBABILITY LAW OF FUNCTIONS OF RANDOM VARIABLES 329 


9.16. 


9.17. 


9.18. 


9.19, 


9.20. 


9.21. 


In 


parameter (2/n). If you prefer to solve the problem for the special case 
that л = 2, this will suffice. Hint: Y obeys an exponential probability 
law with parameter 2 if and only if Fy(y) = 1 — e^! or 0, depending on 
whether y > 0 ory < 0. 

Let X, Y4..., X, be independent random variables (i) uniformly 
distributed on the interval —I to 1, (ii) exponentially distributed with 
mean 2, Find the distribution of the range R = maximum (Xj, Xs, . . x) 
— minimum (Xy, Xs, ..., А): 

Find the probability that in a random sample of size п of a random 
variable uniformly distributed on the interval 0 to 1 the range will exceed 
0.8. 

Determine how large a random sample one must take of a random 
variable uniformly distributed on the interval O to 1 in order that the 
probability will be more than 0.95 that the range will exceed 0.90. 


The random variable Y represents the amplitude of a sine wave; Y 
represents the amplitude of a cosine wave. Both are independently and 
uniformly distributed over the interval 0 to 1. 

(i) Let the random variable R represent the amplitude of their resultant; 
that is, R? = Y? + Ү?, Find and sketch the probability density function 
of R. 

(ii) Let the random variable 0 represent the phase angle of the resultant; 
that is, 0 = tan? (Y/X). Find and sketch the probability density function 
of 0, 

uadratic detector in a radio receiver can be 
here X and Y are independently and normally - 
= 0 and о > 0. If, in addition to noise, 
there is a signal present, the output is represented by (X + a)* 4 (Y +b}, 
where a and b are given constants. Find the probability density function 
of the output of the detector, assuming that (i) noise alone is present, 
(ii) both signal and noise are present. 

Consider 3 jointly distributed random variables X, Y, and Z with a joint 
probability density function 


The noise output of a q 
represented as X? + Y*, w 
distributed with parameters тт 


in "Ne e fore >0,  y-0, z»0 
Fx.y,z (®, y, 2) (+z +y +2) 
=0 otherwise. 


Find the probability density function of the sum X + Y + 2. 


10. THE JOINT PROBABILITY LAW OF FUNCTIONS 
OF RANDOM VARIABLES 


Section 9 we treated in some detail the problem of obtaining the 


individual probability law of a function of random variables. It is natural 
to consider next the problem of obtaining the joint probability law of 


330 RANDOM VARIABLES сн. 7 


several random variables which arise as functions. In principle, = 
roblem is no different from those previously considered. However, the 
nds are more complicated. Consequently, in this section, we content 
ourselves with stating an often-used formula for the joint probability 
density function of n random variables Y,, Ys +--> Y,, which anise as 
functions of п jointly continuous random variables Ху, Xo, ‚Хы: 


(10.1) У, = gO5, Agi a A. Y,— go (X3, P EIS As 51355 
Y, = (Х.Х +, Жу). 


We consider only the case in which the functions gy 5, ..., 2а), 


Boys ®,...,®„), g 009... £) have continuous first partial derivatives 
at all points (2, £», . . . , £„) and are such that the Jacobian 


Ox, Ox, Or, 
дез gə $a 0g» 
(10.2) Лау, 2 n) = E rm is. +0 


Ogn Og, | дв, 
Qu, д2, Ox, 


n 
at all points (21, 2,,..., 2 


п). Let C be the set of points Wis Pagans) 
such that the л equations 


(10.3) y, = 21001,2), m), Yo = & (ш, 05,55,20), 555, 

Yn = guy tos ot y Cn) 
possess at least one solution (ш, %,...,2,). The set of equations in 
(10.3) then possesses exactly one solution, which we denote by 


(10.4) a = gy, Ya, Y), ж —8g (ys yos 


Ty = 87s Yos Yn): 
If Xy, X», ..., X, are jointly continuous random variables, whose joint 
probability density function is continuous at all but a finite number of 
points in the (a, a,.., ‚®„) space, then the random variables ‚Жр e n 


defined by (10.1) are jointly continuous with a joint probability density 
function given by 


(10.5) fry x, vr n Yor Yn) 


=. шым; y, Л x) (x, ey sts © 


SE 
C. 10 JOINT PROBABILITY LAW OF FUNCTIONS OF RANDOM VARIABLES 331 


ds Yor +++ Y,) belongs to C, and zy т... +: r, are given by (10.4); 
Ts ass, у) not belonging to C 

10.5" : 

end NAE У ts и) = 0. 


dy Should be noted that (10.5) extends (8.18). We leave it to the reader 
S ie a similar extension of (8.22). 
Pes omit. the proof that the random variables us Vays ony Y, are 
ШЕ DI and possess a joint probability density. We sketch a 
Goes of the formula given by (10.5) for the joint probability density 
оп, One may show that for any real numbers t. Way - -+ + Mn 


1 
(10.6) Teepe, y (lin 7775 и,) 


1 н 
= lim Ply X а + Ih. 
DERE EI hyis^^* л, 


и, < Ys < uet lig m Y, <u, t h,]. 


TI s Р i 
Ne probability on the right-hand side of (10.6) is equal to 


(10, 
7) Pla < g(X, Xs X) Sat Is 
а. i (X Kin ADS MS Th] 


п == ё 


= (|: ê [fe кых» Ems жь) dry ditg*** ак, 
JxrXot Yn 
s 


in which 
D. 
ac (moms, m)y ty S Bil Me"? 2) Sm +75 
u, L gi en gp) X Uy + hs) 
Now, i | 
OW, if (u, ue, ..., u,) does not belong to C, then for sufficiently small 
г " a,) in D, and the 


Values : 
Sof /, ha, ...,h,, there are no points (а, 29 «5 Un) М Ms 
hat the quantities in (10.6), 


ysmall values of hs has . . -s а» 
= 0 for (иу, Ug» «+ + Up) not in C. 


York, 1937, Vol 11, p. 253, or 
Addison-Wesley, Reading, Massa- 


Postol, Mathematical Analysis. : А 
*-m the integral on the right-hand side of 


Chus 
10, 1957, p. 271) to transform t 
to the integral 


(10.8) uh tta thy uy thn 
an ау `` Í dy, 
ы. и, и 


° n 
TS -1 
Tex (295777 а), Yos * 7 Un г. 
Р, С) 


332 RANDOM VARIABLES cH. 7 


i ili ight- ide of (10.6) by the integral 
lacing the probability on the right-hand side о gr 
B HOS) cad dm taking the limits indicated in (10.6), we finally obtain 
(10.5). 


p> Example 10А. Let X, and X, be jointly continuous random variables. 
Let Uy = X; + Xs, Us = X, — №. For any real numbers и, and tty show 
that 


(10.9) 2 3 

Solution: Let g(x, 25) =a, + а and (хл) = z, — 2, The 
equations щш = 2, + a and и = z, — x, clearly have as their solution 
2, = (uy + u5)|2 and x = (иу — u)/2. The Jacobian J is given by 


I Uy + Uy Uy — us 
Луки и) = 5 fex : 


Ox, Om, 

J= E — 
98 ды Ji 
Ox, Ox, 


In view of these facts, (10.9) is an immediate consequence of (10.5). <4 
In exactly the same way one may establish the following result: 


> Example 10B. Let Y, and X, be jointly continuous random variables. 
Let 


(10.10) Ra Vee. Pal (ЖЖ. 


Then for any real numbers r and «, such that r Zz 0and 0 < « < 2z, 


(10.11) Га o) = r fx, x (r cos a, r sin a). 


It should be noted that we immediatel 
for /2(7) given by (9.13), since 


(10.12) 


y obtain from (10.11) the formula 


fnr) = [stat a). 4 


p> Example 10C. Rotation of axes. Let X, an 


d X, be jointly distributed 
random variables. Let 


(10.13) № =X cosa + X,sinz, У = =X; sin ж + X, cos æ 


for some angle о in the interval 0 < « < 2л. Then 
(10.14) Ју, Yo) = fx, x, (a Cos ® — Ya sin a, y, sin а + 15 COS a). 


To illustrate the use of (10.14), consider two jointly normally distributed 
random variables with a joint probability density function given by (9.31), 


SEC. 10 JOINT PROBABILITY LAW OF FUNCTIONS OF RANDOM VARIABLES 333 


with т, = m = 0. Then 


(10.15) Fr, y Gi Yo) = | 


220,05 V1 = р? 


=l 
х ex [xs me — 2Вм» + eun] 
P a 55^ PU Уз”) 


where 
2 . Ме 
cos? ® соѕ 2 ѕіп х іп 
A= 2p pe. 
gi 0105 сг? 
(10.16) g= sin x sin? g —cos*% cos gsin æ 
5 = P = Г 
gi 0103 Ge 
sin? « cos«sinx , cos*« 
С= — + 2р + T. 
оү? 0105 с»? 


From (10.15) one sees that two random variables Y, and У,, obtained 
by a rotation of axes from jointly normally distributed random variables, 
X, and X,, are jointly normally distributed. Further, if the angle of 
rotation « is chosen so that 


(10.17) tan 2x = 33 е 


then В = 0, and Y, and У, are independent normally distributed. Thus 
by a suitable rotation of axes, two jointly normally. distributed random 
variables may be transformed into two independent. normally distributed 
random variables. 4 


THEORETICAL EXERCISES 


10.1. Let Y, and X, be independent random variables, each exponentially 
distributed with parameter 4. Show that the random variables X, + X; 


and Xj/X, are independent. 

10.2, Let X, and X, be independent random variables, each normally distributed 
with parameters т = 0 and т > 0. Show that Ху? + X? and X/ X, are 
independent. 

10.3. Let x, and X; be independent random variables, x? distributed with лу 
and л, degrees of freedom, respectively. Show that X, + X, and X4/ Xs 
are independent. 

10.4. Let Ху, Xp, and Ху be independent identically normally distributed random 
variables. Let X = (X, + X; + Xa)[3 and S = (Ху — X)* + (А — Хх) 
(Хз — X)*. Show that X and 5 are independent. 


334 RANDOM VARIABLES CH. 7 


Я AES iable. 
ration of a random sample of a normally distributed random variab 
9 эу О, be independent random variables, each uniformly distributed 
on the interval 0 to 1. Show that the random variables 


X; = (—2 log, U,) cos 27U, 

X, = (—2 log, U,) sin 27U, 
are independent random variables, each normally distributed with mean 0 
and variance 1. (For a discussion of this result, see G. E. P. Box and 


Mervin E. Muller, “А note on the generation of random normal deviates,” 
Annals of Mathematics Statistics, Vol. 29 (1958), pp. 610-611.) 


EXERCISES 


10.1. Let X; and X, be independent random variables, each exponentially 


distributed with parameter 2 =}. Find the joint probability density 
function of Y, and Y, in which (i) Y, — X, Xa Yo = Xi — X» 
(ii) Y, = maximum (Xj, X), Y, = minimum (X, Xo). 


10.2. Let X; and X, have joint probability density function given by 


1 2 2 
fx,, x, t) = = if x? +2? <1 


=0 otherwise. 


Find the joint probability density function of (R,0), in which R = 
VX? + X and 0 = tan? ¥,/X,. Show that, and explain why, R? is 
uniformly distributed but R is not. 


10.3. Let X and Y be independent random variables, еас 


over the interval 0 to 1. 


h uniformly distributed 
Find the individual and joint probability density k 


functions of the random variables R and 0, in which R = VX? + y? 
and 0 = tan™ Y] Y, 


10.4. Two voltages X(f) and Y(t) are independently and normally distributed 
with parameters m = 0 and с = 1. These are combined to give two new 
voltages, U(r) — X(r) 4- Y(r) and V(t) = X(t) — Y(t). Find the joint 
probability density function of U(t) and V(t). Are U(t) and V(t) indepen- 
dent? Find P[U(t) > 0, V(r) < 0]. 


11. CONDITIONAL PROBABILITY OF AN EVENT GIVEN А 
RANDOM VARIABLE. CONDITIONAL DISTRIBUTIONS 


In this section we introduce a notion that is basic to the theory of | 
random processes, the notion of the conditional probability of a random 
event A, given a random variable X. This notion forms the basis of the 


SEC. 11 CONDITIONAL PROBABILITY OF AN EVENT 335 


mathematical treatment of jointly distributed random variables that are 
not independent and, consequently, are dependent. 

Given two events, А and B, on the same probability space, the conditional 
probability P[A | B] of the event А, given the event B, has been defined: 
P[AB] 

P[B] 


= undefined if P[B] = 0. 


(11.1) P[A | B) = if P[B] > 0 


Now suppose we are given an event A and a random variable X, both 
defined on the same probability space. We wish to define, for any real 
number z, the conditional probability of the event A, given the event that 
the observed value of Y is equal to x, denoted in symbols by P[A | X = a]. 
Now if PLY = z] > 0, we may define this conditional probability by 
(11.1). However, for any random variable X, P[X — 2] = 0 for all 
(except, at most, a countable number of) values of z. Consequently, the 
conditional probability Р[А | X = x] of the event А, given that X — z, 
must be regarded as being undefined insofar as (11.1) is concerned. 

The meaning that one intuitively assigns to P[4| X — a] is that it 
Tepresents the probability that A has occurred, knowing that X was 
Observed as equal to x. Therefore, it seems natural to define 


(11.2) P[A| X = 2] = lim P[A| x — h < X < x + А] 
h-—0 

if the conditioning events [v — Л < X < x + Л) have positive probability 
for every Л > 0. However, we have to be very careful how we define the 
limit in (11.2). As stated, (11.2) is essentially false, in the sense that the 
limit does not exist in general. However, wecan define a limiting operation, 
Similar to (11.2) in spirit, although different in detail, that in advanced 
Probability theory is shown always to exist. 

Given a real number x, define H,(x) as that interval, of length 1/2", 
Starting at a multiple of 1/2", that contains x; in symbols, 
em, tenen 


(11.3) H,(x) = jz": 


Then we define the conditional probability of the event A, given that the 
Tandom variable X has an observed value equal to x, by 
(11.4) P[A| X =a] = lim P[A| X is in А, (а). 
n= o 
It may be proved that the conditional probability P[ A | X = =], defined 
by (11.4), has the following properties. 


336 RANDOM VARIABLES cH. 7 


First, the convergence set C of points x on the real line at which the 
limit in (11.4) exists has probability one, according to the probability 
function of the random variable X; that is, Ру[С] = 1. For practical 
purposes this suffices, since we expect that all observed values of X lie in 
the set C, and we wish to define P[A | X = х] only at points x that could 
actually arise as observed values of X. 

Second, from a knowledge of P[A | X = x] one may obtain P[A] by the 
following formulas: 

а 
|| P[A | X = z] dFx (x) 


(11.5) P[A] = [ги | X = a] fx(2) dz 


| P[A| X ==х]рх(ж), 

over all z such 

that py(z) > 0 

in which the last two equations hold if X is respectively continuous or 
discrete. More generally, for every Borel set B of real numbers, the 
probability of the intersection of the event А and the event (X is in B} that 
the observed value of X is in B is given by 


(11.6) P[A{X is in B} = [ғи | X = а] аР,(а). 
B 
Indeed, in advanced studies of probability theory the conditional 
probability P[A | X = x] is defined not constructively by (11.4) but descrip- 
tively, as the unique (almost everywhere) function of x satisfying (11.6) for 


every Borel set B of real numbers. This characterization of P[A | X = a]is 
used to prove (11.15). 


> Example 11А. A young man and a young lady plan to meet between 
5:00 and 6:00 P.M., each agreeing not to wait more than ten minutes for 
theother. Assume that they arrive independently at random times between 
5:00 and 6:00 p.m. Find the conditional probability that the young man 
and the young lady will meet, given that the young man arrives at 5:30 Р.М. 

Solution: Let X be the man's arrival time (in minutes after 5:00 P.M.) 
and let Y be the lady's arrival time (in minutes after 5:00 р.м.). If the 
man arrives at a time z, there will be a meeting if and only if the lady’s 
arrival time Y satisfies |Y — z| < 10 or —10 + z <Y¥<2x+4+10. Let A 


denote the event that the man and lady meet. Then, for any x between 0 
and 60 


(11.7) P[A| X =a] = P[-10< Y— X x10| X — a] 
= P[-1l0+a< Ү<=+10| X = 
= P[-10 t z X Y « x 4 10], 


SEC. 11 CONDITIONAL PROBABILITY OF AN EVENT 337 


in which we have used (11.9) and (11.11). Next, using the fact that Y is 
uniformly distributed between 0 and 60, we obtain (as graphed i in Fig. 11А) 


1 
(11.8) PAIX =a] = = if0<«<10 
=ў if 10 a < 50 
[( 
_ 9" if 50 < £ < 60 


= undefined if ж — 0 or x > 60. 
Consequently, P[A | Y — 30] — 3, so that the conditional probability that 


hy 
у= P|AIX = х] 

i E 

n 
6 

Undefined Undefined 
pope ! | i 
o| 10 20 30 40 50 6) 7 


Fig. 11A. The conditional probability P[A | X = 2], graphed as a function of x. 


the young man and the young lady will meet, given that the young man 
arrives at 5:30 P.M., is 1. Further, by applying (11.5), we determine that 

A] = n, 

iu (11. 7) we performed certain manipulations that arise frequently when 
One is dealing with conditional probabilities. We now justify these 
manipulations. 

Consider two jointly distributed random variables X and Y. Let g(x, y) 
bh а Borel function of two variables. Let z be a fixed real number. Let 

= [g(X, Y) < 2] be the event that the random variable g(X, Y) has an 
i value less than or equal to z. Next, let x be a fixed real number, 
and let A(x) = [g(z, Y) < 2] be the event that the random variable g(x, Y), 


338 RANDOM VARIABLES CH. 7 


which is a function only of Y, has an observed value less than or equal to z. 
It appears formally reasonable that 

(11.9) P[g(X, Y) €z| X= а] = Р[е(т, Y)xz|X-a] 

In words, a statement involving the random variable Y, conditioned by the 
h Gtlissis that the value of Y isa given number z, has the same conditional 
Probability given X = z, as the corresponding statement obtained by 


replacing the random variable X by its observed value. The proof of (11.9) 
is omitted, since it is beyond the scope of this book. 


It may help to comprehend (1 1.9) if we state it in terms of the events 
А = [g(X, Y) < z] and A(x) = [g(v, Y) < z]. Equation (1 1.9) asserts 
that the functions of u, 

(11.10) P[A|X = u] and P[A(x)|X = и] 
have the same value at u = a | 

Another important formula is the following. If the random variables 
X and Y are independent, then 
(11.11) Pg, Y) X 2| X = а] = Роба, Y) <2], 
since it holds that 
(11.12) P[4| XY = x] = P[A] 

We thus obtain the basic fact th 

independent 

(11.13) P[g(X, Y) S2|X=2]= Plg(x, 
We next define the noti 


random variable Y 
For any real numbe 


(11.14) 


if the event A is independent of X. 


at if the random variables X and Y are 


Y) S2| Х =a] = P[g(v, Y) € 2]. 


on of the conditional distribution function of one 
given another random variable Y, denoted FyaxCl- 
rs ж and y, it is defined by 

Fyix(y| 3) = PLY SS g LX а]. 

The conditional distribution function FyixC].) has the basic property that 
for any real numbers x and y the joint distribution function Fx y Gs 0) 
may be expressed in term 


S of Р (у | a) by 
zt 
Fx yx, y) =Í Fyix(y | x’) dF y(x’). 


To prove (11.15), let x and Y be two jointly distributed random 
variables. For two given real numb 


ers v and y define 4 = [y < y]. Then 
(11.15) may be written 

(11.16) КЕЕ 
If in (11.6) В = fa’; 


(11.15) 


PIAJ X = аар y (a). 
x < 20), (1 1.16) is obtained. 


SEC. 11 CONDITIONAL PROBABILITY OF AN EVENT 339 


Now suppose that the random variables Y and Y are jointly continuous. 
We may then define the conditional probability density function of the 
random variable Y, given the random variable Y, denoted by frx | 2). 
It is defined for any real numbers x and y by 


] д 
(11.17) Лх 1а) = y Fyix(y | 2). 
We now prove the basic formula: if f (x) > 0, then 
Sx r&y) 
(11.18) Гах) = pu 


To prove (11.18), we differentiate (11.15) with respect to x (first replacing 
dF x(a*) by fx(v') de’). Then 


д 
(11.19) à; Pr y) = Fru 12) fx). 
Now differentiating (11.19) with respect to y, we obtain 
(1 1.20) Sx vy) = fyi | x) fx (2) 


from which (11.18) follows immediately. 


> Example 11B. Let X, and X, be jointly normally distributed random 
Variables whose probability density function is given by (9.31). Then the 
Conditional probability density of X, given Xo, is equal to 


(21) f. (olim l 


1 | 91 f] 

; — or |= — № — mai: a 7 

X exp 1 — Pee 1 Р P C 2) 

In Words, the conditional probability law of the random variable AX, given 
Xa, is the normal probability law with parameters m = ту + p(o,/c9) 
(5 — т) and о = a, V1 — p®. To prove (11.21), one need only verify 
that it is equal to the quotient Tx, a Gs lf x n Similarly, one may 
establish the following result. 4 


» Example 11C. Let X and Y be jointly distributed random variables. 
Let 


(11.22) =VX} Y, 0 = бап (УЈХ). 
Then, for r > 0 


fx. y(r cos 0, r sin 0) 


(11.23) for lr) = = . 
Í 40 fy y(r cos 0, r sin 0) < 
0 


340 RANDOM VARIABLES CH. 7 


In the foregoing examples we have considered the problem of obtaining 
Глу | 9), knowing fy (v, y). We next consider the converse problem of 
DG the individual probability law of X from a knowledge of the 


conditional probability law of X, given Y, and of the individual probability 
law of Y. 


p> Example 11D. Consider the decay of particles in a cloud chamber 
(or, similarly the breakdown of equipment or the occurrence of accidents). 
Assume that the time X of any particular particle to decay is a random 
variable obeying an exponential probability law with parameter y. How- 


ever, it is not assumed that the value of y is the same for all particles. 
Rather, it is assumed that there are 


ment of different types or indiv 
More specifically, 


particles of different types (or equip- 
iduals of different accident proneness). 


it is assumed that for a particle randomly selected from 
the cloud chamber the parameter y is a particular value of a random 


variable Y obeying a gamma probability law with a probability density 
function, 


(11.24) fv) = 


В“ 
Tr(a) 


in which the parameters « and 
experimental conditions under 


Y^ le, forg 0), 


P are positive constants characterizing the 
which the particles are observed. 

The assumption that the time X of a particle to decay obeys an 
exponential law is now expressed as an assumption on the conditional 
probability law of Х given Y: 

(11.25) Дхүу(®| y) = yer for x — 0, 


We find the individual probabilit 


y law of the time Y 
at random to decay) as follows; 


(of a particle selected 
forz>0 


mes А =|" ле, 9 dy -[` Sri |) fr) dy 


A 
=] ye-w ж—1„—бу 
Í y I eg, 


T. à 


=== 


(В + уст " 
The reader interested in further study of the foregoing model, as well as 
a number of other interesting topics, should consult J. Neyman, “The 
Problem of Inductive Inference,” Communications on Pure and Applied 
Mathematics, Vol. 8 (1955), pp. 13-46. 4 


SEC. 11 CONDITIONAL PROBABILITY OF AN EVENT 341 


The foregoing notions may be extended to several random variables. 
In particular, let us consider n random variables X}, №,..., X, and a 
random variable U, all of which are jointly distributed. By suitably 
adapting the foregoing considerations, we may define a function 
(11.27) Ех ух, хиб» Bay Ty 10), 
called the conditional distribution function of the random variables 


Kay Kos oss 5 X,, given the random variable U, which may be shown to 


Satisfy, for all real numbers а, 25, ..., =, and u, 
(11.28) Fy, ss xu up 0) 
“и 
=| Fyp хш oos 2, | un) АЕ (и). 


THEORETICAL EXERCISES 


11.1. Let T be a random variable, and let г be a fixed number. Define the 
random variable U by U — T — t and the event A by A = [Т > t]. 
Evaluate P[4| U = =] and P[U > х | A] in terms of the distribution 
function of T. Explain the difference in meaning between these concepts. 


11.2. If Y and Y are independent Poisson random variables, show that the 
conditional distribution of X, given X + Y, is binomial. 


11.3. Given jointly distributed random variables, X, and А, prove that, for any 
vy and almost all 2, Fx, C's | ж) = Fx,G3) if and only if X, апа X, are 
independent. 


11.4. Prove that for any jointly distributed random variables Y; and X, 
© E 

| fixus Gr | 9) dn, = 1, | хах | ©) deg = 1. 
a -0 


For contrast evaluate 


со © 
| fx yx | v9) ds, | Fx Gs | vy) dry. 


EXERCISES 


In exercises 11.1 to 11.3 let Y and Y be independent random variables. Let 
Z=Y~y. Let A = |= Х| <1]. Find Ö Pl4| X = 1], (ii) Fzx(0| D, 
Gid £z «0| 1), (iv) PLZ < 0| А]. 


11.1. If Y and Y are each uniformly distributed over the interval 0 to 2. 


112. If Y and Y are each normally distributed with parameters т = 0 and 
с = 2, 


342 
11.3. 


RANDOM VARIABLES CH. 7 


If X and Y are each exponentially distributed with parameter 2 = 1. 


i i Let 
ises 11.4 to 11.6 let X and Y be independent random variables. 
T ak сч dem, V=Y—X. Let A—[|V| c1]. Find (i) РА | = 1L 
Gi) Fryc(0 | D, Git) fric | D, v) PIU = 0] А], (у) (ои). 


11.4. 
11.5. 


11.6. 
11.7. 


11.8. 


If X and Y are each uniformly distributed over the interval 0 to 2. 


If X and Y are each normally distributed with parameters m — 0 and 
с = 2. 


If X and Y are each exponentially distributed with parameter 5 = 1. 


Let Х and X, be jointly normally distributed random variables (repre- 
senting the observed amplitudes of a noise voltage recorded a known time 
interval apart). Assume that their joint probability density function is 
given by (9.31) with (i) m == т» = 0, 6, =o, = 1, p = 0,5, (ii) m, = 1, 
тз = 2, @ = 1, 0, = 4,р = 0.5. Find PLY, > 1 | X, = 1]. 

Let X, and X, be jointly normall 


y distributed random variables, represent- 
ing the daily sales (in thousands of units) of a certain product in a certain 


Store on two successive days. Assume that the joint probability density 
function of X, with n = m, —3, o, =% = 1, 
р = 0.8. Find = 0.05, (ii) PLY, > K| X, = 2] = 

1] — 0.05. Suppose the store desires to have 
on hand ona given day enough units of the product so that with probability 
0.95 it can supply all demands for the product on the day. How large 


à given morning if (iv) yesterday's sales were 
ales are not known, 


CHAPTER 8 


Expectation 
of a Random Variable 


In dealing with random variables, it is as important to know their means 
and variances as it is to know their probability laws. In this chapter we 
define the notion of the expectation of a random variable and describe the 
Significant role that this notion plays in probability theory. 


1. EXPECTATION, MEAN, AND VARIANCE 
OF A RANDOM VARIABLE 


Given the random variable X, we define the expectation of the random 
Variable, denoted by E[X], as the mean of the probability law of Y; in 
Symbols, 


Jenn 


(1.1) E[X] — | xf x(x) dx 


x px(x), 
over all z such 
абр уб) 0 


depending on whether X is specified by its distribution function Fx(:), its 
Probability density function fx(:), or its probability mass function px(). 
343 


344 EXPECTATION OF A RANDOM VARIABLE cH. 8 


Given a random variable Y, which arises as а Borel function of a 
random variable X so that 


(1.2) Fm g(X) 


for some Borel function g(-), the expectation E[g(X)], in view of (1.1), is 
given by 


(13) EBON = [ varus. 


On the other hand, given the Borel function g() and the random 
variable X, we can form the expectation of g(x) with respect to the 
probability law of X, denoted by Ey[g(x)] and defined by 


[оао 
(L4) Ess =] [^ кло) de 


spx), 
over all z such 
that p y(z) - 0 


depending on whether X is specified by its distribution function Fx), its 
probability density function fy(:), or its probability mass function p x(). 


It is a striking fact, of great importance in probability theory, shat for 
any random variable X and Borel function g(-) 


(1.5) Elg(X)] = Ex[g()] 
if either of these expectations exists. In words, (1.5) says that the expectation 
of the random variable 8(X) is equal to the expectation of the function 
gC) with respect to the random variable X. 

The validity of (1.5) is а direct consequence of the fact that the integrals 
used to define expectations are Tequired to be absolutely convergent.* 
Some idea of the proof of (1.5), in the case that ЕС) is continuous, can be 
gained. Partition the y- 


axis in Fig. lA into subintervals by points 
Vo < <... < y, Then approximately 


(1.6) E, ly] = | y dF хуу) 


n 
TES) an Fyxy 4) 


= УРЦ: y< ga) < yy 


* At the end of the section we give an example that shows th 


at (1.5) does not hold if 
the integrals used to define expectations are not required to co; 


nverge absolutely. 


SEC. |. EXPECTATION, MEAN, AND VARIANCE OF A RANDOM VARIABLE 345 


To each point y; on the y-axis, there is a number of points ajP, 2, .. . , 
at which g(x) is equal to y. Form the set of all such points on the x-axis 
that correspond to the points y. . . y, Arrange these points in increasing 
order, 20 < a, <... <tn These points divide the z-axis into sub- 
intervals. Further, it is clear upon reflection that the last sum in (1.6) is 
equal to 


am ‚УР : tra << а) Exft. 

Which completes our intuitive proof of (1.5). A rigorous proof of (1.5) 
cannot be attempted here, since a more careful treatment of the integration 
process does not lie within the scope of this book. 


Fig. 1А. With the aid of this graph of a possible function g(), 
one can see that (1.5) holds. 


Given a random variable Y and a function g(-), we thus find two distinct 
notions, represented by E[g(X)] and Ex[g(x). which nevertheless, are 
always numerically equal. It has become customary always to use the 
Notation E[e( X)], since this notation is the most convenient for technical 
manipulation. However, the reader should be aware that although we 
Write E[g(X)] the concept in which we are really very often interested is 
E x[g(a)], the expectation of the function g(x) with respect to the random 
Variable Y, Thus, for example, the nth moment of a random variable X 
(for any integer л) is often defined as E[X"]. the expectation of the mth 
Power of X. From the point of view of the intuitive meaning of the nth 
Moment, however, it should be defined as the expectation Ex[x"] of the 


346 EXPECTATION OF A RANDOM VARIABLE cu. 8 


function g(x) = x” with respect to the probability law of the ime 
variable X. We shall define the moments of a random variable in terms о 

the notation of the expectation of a random variable. However, it should 
be borne in mind that we could define as well the moments of a random 
variable as the corresponding moments of the probability law of the 
random variable. 

Given a random variable Y, we denote its mean by E[X], its mean square 
by ELX?], its square mean by E?[X], its nth moment about the point c by 
E[(X — c)"], and its nth central moment (that is, nth moment about its 
mean) by E[(X — E[X]"]. In particular, the variance of a random 


variable, denoted by Var [X], is defined as its second central moment, so 
that 


(1.8) Var [X] = E[(X — E[X]?] = Е[Х?] — ЕХ]. 


The standard deviation of a random variable, denoted by о[Х], is defined 
as the positive Square root of its variance, so that 

(1.9) ох] = VVar [X], o°[X] = Var [X]. 
The moment generating function of a ra 
is defined for every real number / by 


(1.10) vx) = Ee]. 


It is shown in section 5 that if Ху, Xs... , X, constitute a random sample 
of the random variable X then the arithmetic mean (X, + X+... + Xn 
is, for large n, approximately equal to the mean E[X]. This fact has led 
early writers on probability theory to call E[X] the expected value of the 
random variable Y; this terminology, hov 


vever, is somewhat misleading, 
for if E[X] is the expected value of any random variable it is the expected 


value of the arithmetic mean of a random sample of the random variable. 
p» Example 1А. The mean duration of the game of “odd man out." The 
game of *odd man out” 


was described in example 3D of Chapter 3. On 
each independent play of the game, N players independently toss fair coins. 


The game concludes when there is an odd man; that is, the game concludes 


y one of the coins falls heads or exactly one of the 
coins falls tails. Let Y be the number 


of plays required to conclude the 
game; more briefly, X is called the duration of the game. Find the mean 
and standard deviation of Y. 
Solution: It has been shown that the random variable X obeys a 
geometric probability law with parameter 


= N[2*5-!. The mean of X 
is then equal to the mean of the geomet 


to Tic probability law, so that 
E[X] = Шр. Similarly, [X] = q/p?. Thus, if N = 5, Ey] = 24/5 = 3.2, 


e*t] = (11/16)/(5/16)* = (11)(16)/25, апа spy] = 4V11/5 = 2.65. The 


ndom variable, denoted by v), 


SEC. | EXPECTATION, MEAN, AND VARIANCE OF A RANDOM VARIABLE 347 


mean duration E[X] has the following interpretation; if Xy, №,..., Хь 
are the durations of л independent games of “odd man out,” then the 
average duration (X, + X, +... + X,)/n of the n games is approximately 
equal to E[Y] if the number п of games is large. Note that in a game with 
five players the mean duration E[X](— 3.2) is not equal to an integer. 
Consequently, one will never observe a game whose duration is equal to 
the mean duration; nevertheless, the arithmetic mean of a large number of 
observed durations can be expected to be equal to the mean duration. <q 


To find the mean and variance of the random variable X, in the foregoing 
example we found the mean and variance of the probability law of X. 
If a random variable Y can be represented as a Borel function Y = g,(X) 
of a random variable Y, one can find the mean and variance of Y without 
actually finding the probability law of Y. To do this, we make use of an 


extension of (1.5). 
Let X and Y be random variables such that Y = g,(X) for some Borel 


function gC). Then for any Borel function g(-) 
(1.11) E[g( Y)] = Elg(ga OO) 


in the sense that if either of these expectations exists then so does the other, 


and the two are equal. 
To prove (1.11) we must prove that 


(1.12) L gy) dF, xy @) =|" g(gi 0) dF у (®). 


The proof of (1.12) is beyond the scope of this book. | | 
To illustrate the meaning of (1.1 1), we write it for the case in which the 


random variable Y is continuous and g;(2) = 2, Using the formula for 
the probability density function of Y — X?, given by (8.8) of Chapter 7, 


we have for any continuous function g(°) 
0.13) Ау) = [rw dy 
0 
- i @ zie Ux CV) + fs C VD di 
~ Jo sy 2Vy z 5 
Whereas 
(1.14) Ele(g(X))] = Elg(X?)] = a Be fa) ds. 


One may verify directly that the integrals on the right-hand sides of (1.13) 
and (1.14) are equal, as asserted by (1.1 1). 


348 EXPECTATION OF A RANDOM VARIABLE cu. 8 


As one immediate consequence of (1.11), we have the following formula 
for the variance of a random variable g( X), which arises as a function of 
another random variable: 


(1.15) Var[g(X)] = E[g*(X)] — E*[g(X)]. 


р Example 1B. The square of a normal random variable. Let Y be a 
normally distributed random variable with mean 0 and variance o?. Let 
Y = X*. Then the mean and variance of Y are given by E[Y]= E[X?] — о°, 
Var [Y] = E[X!] — E*[X?] = 3e! — of = 261, 


If a random variable X is known to be normally distributed with mean 
m and variance c?, then for brevity one often writes Y is N(m, c?). 


p» Example ІС. The logarithmic normal distribution. A random variable 
X is said to have a logarithmic normal distribution if its logarithm log X 
is normally distributed. One may find the mean and variance of Y by 
finding the mean and variance of Y — еї, in which Y is N(m, o?). Now 
E[X] = E[e*] is the value at / = 1 of the moment-generating function 
ру(0) of Y. Similarly E[X?] = [e] = Vy(2. Since yy(t) = exp (mt + 


2008), it follows that E[X]= exp (m + 402) and Var [x] = E[X?] – E(X] = 
exp Qm + 20°) — exp (2m + о?). 4 


Example 1D shows how the mean (or the expectation) of a random 
variable is interpreted. 
p» Example 1D. Disadvantageous or unfair bets. 


inui Roulette is played by 
Spinning a ball on a circular w' 


heel, which has been divided into thirty- 
ing numbers from 0 to 36.* Let X denote 


^ probability mass function of X is 
given by рх(ж) = 1/37 for z —0,1,.. 


-> 36. Suppose that one is given 
even odds on a bet that the Observed value of Y is an odd number; that is, 


Solution: Define a random variable Y as equal to the amount won by 
betting 1 dollar on an odd outcome at a play of the game of roulette. 
Then Y= 1 if X is odd and Y=-1 if Yis not odd. Consequently, 


SEC. | EXPECTATION, MEAN, AND VARIANCE OF А RANDOM VARIABLE 349 


P[Y = 1] = 18 and P[Y = —1] = 39. The mean E[Y] of the random 
variable Y is then given by 


(116) ELY] = 1: py(D + (7D: py(—1) = as = —0.027. 


The amount one can expect to win at roulette by betting on an odd outcome 
may be regarded as equal to the mean E[Y] in the following sense. Let 
Y, Ү„,..., Y,,... be one’s winnings in a succession of plays of roulette 
at which one has bet on an odd outcome. It is shown in section 5 that the 
average winnings (Y, + Y, +... + Y,)/n in n plays tends, as the number 
of plays becomes infinite, to ЕТУ]. The fact that E[Y]is equal to a negative 
number implies that betting on an odd outcome at roulette is disadvan- 
tageous (or unfair) for the bettor, since after a long series of plays he can 
expect to have lost money at a rate of 2.7 cents per dollar bet. Many games 
of chance are disadvantageous for the bettor in the sense that the mean 
winnings is negative. However, the mean (or expected) winnings describe 
Just one aspect of what will occur in a long series of plays. For a gambler 
who is interested only in a modest increase in his fortune it is more 
important to know the probability that as a result of a series of bets on an 
Odd outcome in roulette the size of his 1000-dollar fortune will increase to 
1200 dollars before it decreases to zero. A home owner insures his home 
against destruction by fire, even though he is making a disadvantageous 
bet (in the sense that his expected money winnings are negative) because 
he is more concerned with making equal to zero the probability of a large 


loss, 4 


Most random variables encountered in applications of probability theory 
have finite means and variances. However, random variables without 
finite means have long been encountered by physicists in connection with 
Problems of return to equilibrium. The following example illustrates a 
random variable of this type that has infinite mean. 


> Example 1E. On long leads in. fair games. Consider two players 
engaged in a friendly game of matching pennies with fair coins. The game 
15 played as follows. One player tosses a coin, while the other player 
guesses the outcome, winning one cent if he guesses correctly and losing 
One cent if he guesses incorrectly. The two friends agree to stop playing 
the Moment neither is winning. Let N be the duration of the game; that 
is, N is equal to the number of times coins are tossed before the players are 
even. Find E[N], the mean duration of the game. 

Solution: It is clear that the game of matching pennies with fair coins is 
not disadvantageous to either player in the sense that if Y is the winnings 
ofa given player on any play of the game then Е[Ү]= 0. From this fact 
9ne may be led to the conclusion that the total winnings S, of a given 


350 EXPECTATION OF A RANDOM VARIABLE cH. 8 


player in n plays will be equal to 0 in half the plays, over a very large 
number of plays. However, no such inference can be made. Indeed, 
consider the random variable N, which represents the first trial N at which 
Sy = 0. We now show that E[N] = oo; in words, the mean duration of 
the game of matching pennies is infinite. Note that this does not imply 
that the duration N is infinite; it may be shown that there is probability 
one that in a finite number of plays the fortunes of the two players will 
equalize. To compute E[N], we must compute its probability law. The 
duration N of the game cannot be equal to an odd integer, since the 
fortunes will equalize if and only if each player has won on exactly half 
the tosses. We omit the computation of the probability that N = n, for n 
an even integer, and quote here the result (see W. Feller, An Introduction 


to Probability Theory and its Applications, second edition, Wiley, New 
York, 1957, p. 75): 


(1.17) PIV = Dei] = 1. ÁÉ = 2) 2-а, 
т\т – 1 


The mean duration of the game is then given by 


(1.18) E[N] = > отуру = 2m]. 


It may be shown, using Stirling's formula, that 


(1.19) (am ња 1 


п (пл)! ° 


the sign ~ indicating that the ratio of the two sides in (1.19) tends to 1 as” 


tends to infinity, Consequently, (2m)P[N = 2m] > КУ for some con- 
stant K. Therefore, the infinite series in (1.1 


8) diverges, and E[N] = со. «4 

To conclude this section, let us justify the fact that the integrals defining 

expectations аге required to be absolutely convergent by showing, by 

example, that if the expectation of a continuous random variable X is 
defined by 


(1.20) E[X] — lim I af (2) de 


then it is not necessarily true that for апу constant c 


ЕХ + d = EIX] + c. 
Let X be а random variable whose probability density function is an 
even function, that is, fx(—2) 


= /х(®). Then, under the definition given 


SEC. |. EXPECTATION, MEAN, AND VARIANCE OF A RANDOM VARIABLE 351 


by (1.20), the mean £[X] exists and equals 0, since Г xf (x) de = 0 for 
every a. Now SR 
la 
(1.21) ЕХ + c] = lim | И/ху — с) dy. 
aw J-a 


Assuming c > 0, and letting u = y — c, we may write 


а а 
| Ufx(y — c) dy = | (и + c) f x0 du 
=a /—а—с 
"a : e "acc ae 
-| uf x(u) du -| uf y(u) du + ef /х(и) du. 
ase а—с -а—с 
The first of these integrals vanishes, and the last tends to 1 as a tends to co. 
Consequently, to prove that if E[X] is defined by (1.20) one can find a 
random variable А and a constant c such that E[X + c] 4 E[X]+ c, it 
suffices to prove that one can find an even probability density function 


JC) and a constant c > 0 such that 
lac 
(1.22) itis not so that. lim | uf (u) du = 0. 
dcs ИЕ" 


^n example of a continuous even probability density function satisfying 
(1.22) is the following. Letting 4 — 3[z?, define 


(123) уа) = A at —|k—a  ifk= 41, £”, x3 
such that |k — x| <1 
=0 elsewhere. 


In words, /(») vanishes, except for points x, which lie within a distance | 
| 9x 35 dh sss 


Tom a point that in absolute value is à perfect square 1, 2°, 3 
That /(-) is a probability density function follows from the fact that 


Paw = 1 т? 
rife — — =2A—= 1. 
| NI dx = 242 d 24- =l 


That (1.22) holds for c > 1 follows from the fact that for k = 2°, 38 cns 
{* [у па аай 

ths u) du = > =. 
hii uf(u) du > (k Dhi f É 5 


THEORETICAL EXERCISES 


11. The mean and variance of a linear function of a random variable. Let X 
be a random variable with finite mean and variance. Let a and b be real 
numbers. Show that 

E[aX + b] = aEL[X] + b, 


аах + b] = laleX1. vax ua = epy(at). 


Var [aX + b] = [ap Var [X], 
(1.24) 


352 
1.2. 


1.3. 


1.1. 


1. 


1.3. 


as: 


1.6. 


EXPECTATION OF A RANDOM VARIABLE CH. 8 


Chebyshev's inequality for random variables. Let X be a random variable 
with finite mean and variance. Show that for any / > 0 and any є > 0 


1 
РХ = ЕХ = ХАЙ > 1— zs PIX EXN > hol x= я 
(1.25) P “Ly 
Pix-ExXpzdzi1-7U0, — py x e> ge 100, 


gm t 2 


Hint: РЦХ — ELX]| < Ae[X]] = FxCEUX] + hoLX]) = Еү(Е[Х] — Ao LX) 
if FxC) is continuous at these points. 


Continuation of example 1E, Using (1.17), show that P[N < о] = 1. 


EXERCISES 


Consider a gambler who is to win 1 dol 
tossed; otherwise he wins nothi 
winnings. 


Suppose that 0.008 is the probabili 
35. Find the mean and variance 
among 20,000 men of this age, 


Consider a man who buys 
and that gives 4 prizes of 2 
of 10 dollars. How much 
this lottery? 


lar if a 6 appears when a fair die is 
ng. Find the mean and variance of his 


ty of death within a year of a man aged 
of the number of deaths within a year 


à lottery ticket in a lottery that sells 100 tickets 
00 dollars, 10 prizes of 100 dollars, and 20 prizes 
should the man be willing to pay for a ticket in 


Would you pay 1 dollar to bu 
tickets and gives 1 prize of 100, 
100 prizes of 1000 dollars? 


Nine dimes and a silver dollar are in a red purse, and 10 dimes are in a 
black purse. Five coins are selected without replacement from the red 
purse and placed in the black purse. Then 5 coins are selected without 
replacement from the black purse and placed in the red purse. The amount 


of money in the red purse at the end of this experiment is a random variable. 
What is its mean and variance ? 


Y a ticket in a lottery that sells 1,000,000 
000 dollars, 10 prizes of 10,000 dollars, and 


throw the bank pays the player 2 dollars, 
time on the third throw the 


4 = 2? dollars. In general, if 
heads appears for the first time on the nth t 
dollars. The amount of mon 
variable; find its mean, willing to pay this amount to 
play the game? (For a discussion of this p 


called a paradox see T, C. Fry, Probabilit 


y and Its Engineering Uses, Nan 
Nostrand, New York, 1928, pp. 194-199.) PEEING 


| 1.7. 


1.8. 


1.9, 


1.10, 


1.11. 


SEC. 1 EXPECTATION, MEAN, AND VARIANCE OF A RANDOM VARIABLE 353 


The output of a certain manufacturer (it may be radio tubes, textiles, 
canned goods, etc.) is graded into 5 grades, labeled А5, А?, АЗ, A?, and А 
(in decreasing order of quality). The manufacturer's profit, denoted by X, 
on an item depends on the grade of the item, as indicated in the table. The 
grade of an item is random; however, the proportions of the manu- 
facturer's output in the various grades is known and is given in the table 
below. Find the mean and variance of .Y, in which X denotes the manu- 
facturer's profit on an item selected randomly from his production. 


Profit on an Item Probability that an 
Grade of an Item of This Grade Item Is of This Grade 
AS $1.00 $ 
A 0.80 1 
AB 0.60 і 
А 0.00 E 
A —0.60 i 


Consider a person who commutes to the city from a suburb by train. He 
is accustomed to leaving his home between 7:30 and 8:00 a.m. The drive 
to the railroad station takes between 20 and 30 minutes. Assume that the 
departure time and length of trip are independent random variables, each 
uniformly distributed over their respective intervals. There are 3 trains 
that he can take, which leave the station and arrive in the city precisely 
оп time. The first train leaves at 8:05 A.M. and arrives at 8:40 A.M., the 
second leaves at 8:25 A.M. and arrives at 8:55 A.M., the third leaves at 
9:00 a.m. and arrives at 9:43 A.M. 

(i) Find the mean and variance of his time of arrival in the city. 

(ii) Find the mean and variance of his time of arrival under the assumption 
that he leaves his home between 7:30 and 7:55 A.M. 


Two athletic teams play a series of games; the first team to win 4 games 
is the winner. Suppose that one of the teams is stronger than the other 
and has probability p [equal to (i) 0.5, Gi) $] of winning each game, 
independent of the outcomes of any other game. Assume that a game 
cannot end іп a tie, Find the mean and variance of the number of games 
required to conclude the series. (Use exercise 3.26 of Chapter 3.) 
Consider an experiment that consists of N players independently tossing 
fair coins. Let А be the event that there is an "odd" man (that is, either 
exactly one of the coins falls heads or exactly one of the coins falls tails). 
For r — 1, 2,... let X, be the number of times the experiment is repeated 
until the event occurs for the rth time. 

(i) Find the mean and variance of X;. 

(ii) Evaluate ELY,] and Var [X,] for N = 3, 4, 5 and r = 1, 2, 3. 

Let an urn contain 5 balls, numbered 1 to 5. Let a sample of size 3 be 


drawn with replacement (without replacement) from the urn and let X be 
the largest number in the sample. Find the mean and variance of X. 


354 EXPECTATION OF А RANDOM VARIABLE cH. 8 


1.2. Let X be NUn, c°). Find the mean and variance of (i) | X], (ii) |X — 


с 
2 
where (а) c is a given constant, (b) с = m = с = 1, (с) с 


т = 1, с = 2. 
1.13. Let X and У be independent random variables, each №(0, 1). Find the 
mean and variance of V X? + У, 


i г ic iable X that obeys the 
44. Find the mean and variance of a random variable | 5: 
"us probability law of Laplace, specified by the probability density function, 
for some constants х and # > 0: 


1 


[ж — «| B TT 
ГӘ = здр - т ) —® «uw, 


The velocity v of a molecule with mass 7 in 
T is a random variable obeying the M 


1.15. 


à gas at absolute temperature 
axwell Boltzmann law: 


- z<0, 


in which В = m[QkT), k = Boltzmann's constant. Find the mean and 
variance of (i) the velocity of 


ка ео ln? 
à molecule, (ii) the kinetic energy E = 570 
of a molecule, 


2. EXPECTATIONS OF JOINTLY DISTRIBUTED 
RANDOM VARIABLES 


Consider two jointly distributed 
expectation Ey x go 
defined as follows: 

If the random variables 
probability density function 


random variables Y, and Л. The 
1 ?)] of a function 8(%, 25) of two real variables is 


А and X, are jointly continuous, with joint 
Јох, t), then 


ML EM << udis dide, 


If the random variables X, and X, are jointly discrete, with joint 
probability mass function PyVac 5s 25), then 
(2.2) Ey v 150, )] = gn, х,)рү „(Ж Ж). 
over all (pyre) such an 
that хрх) - 0 
If the random variables X 
Fy xn 25), then 


(2.3) 


and X, have joint distribution function 


Ex xig (3, а) -J [ S5, 25) dF yy (р 25), 
where the two-dimensional Stieltjes integral may be defined in a manner 
similar to that in which the one-dimensional Stieltjes integral was defined 
in section 6 of Chapter 5. 


SEC. 2 EXPECTATIONS OF JOINTLY DISTRIBUTED RANDOM VARIABLES 355 


On the other hand, g(X;. X;) is a random variable, with expectation 


| y 4Ехх„х„%) 


(24) Elg(X,. X9)] = + | fas a f) di 


2 UPycc xD 

over all points y 
where Ри уух) > 0 
depending on whether the probability law of g(X,, X.) is specified by its 
distribution function, probability density function, or probability mass 
function. 

It is a basic fact of probability theory that for any jointly distributed 
random variables X, and № and any Borel function (21. 23) 


(2.5) Elg(X,, ХУ] = Ex rlen t 
in the sense that if either of the expectations in (2.5) exists then so does the 
Other, and the two are equal. A rigorous proof of (2.5) is beyond the 
Scope of this book. 

In view of (2.5) we have two ways of computing the expectation of a 
function of jointly distributed random variables. Equation (2.5) generalizes 
(1.5). Similarly, (1.11) may also be generalized. 

Let Xi Xə, and Y be random variables such that Y = g(Xj, Xy) for 
Some Borel function gy(x,, хә). Then for any Borel function gC) 

(2.6) E[g( Y)] = Elg(gi Xs X3). 

The most important property possessed by the 
Of a random variable is its linearity property: if Xy and X, are jointly 
distributed random variables with finite expectations £[X;] and ЕТ), 
then the sum X, + X; has a finite expectation given by 
(2.7) ЕХ, + ХД = ЕХ] + ЕХ. 

Let us sketch a proof of (2.7) in the case that №, and X, are jointly 
Continuous. The reader may gain some idea of how (2.7) is proved in 
Seneral by consulting the proof of (6.22) in Chapter 2. 

From (2.5) it follows that 


operation of expectation 


07) FLY, + = [ | © (ау dra C te) di ds. 
Now NES 


| day “| : dis fy y (98 t) = | dey xfx (ху) = EUG] 


(2.77) 


^о 


| | dry fy y (te 9) = | КЕ tafx) = EDGE 


356 EXPECTATION OF A RANDOM VARIABLE сн. 8 


i ight- i 2.7") is equal to the sum of the 
1 on the right-hand side of ( 
i i on the left-hand sides of (2.7"). The proof of (2.7) is now complete. 
С The moments and moment-generating function of jointly distributed 
andom variables are defined by a direct generalization of the definitions 
Ew for a single random variable. For any two nonnegative integers л; 
and n, we define 
Q.8) Sng = ELK" X2] 
as a moment of the jointly distributed random variables Y, and Y,. The 
sum лу + n is called the order of the moment. For the moments of 


orders 1 and 2 we have the following names; o „апа жє are, respectively, 


the means of X, and X,, whereas 25,9 and 4, are, respectively, the mean 


Squares of X, and Х,. The moment z,, = E[X,X;] is called the product 
moment. 

We next define the central moments of the random variables X, and А. 
For any two nonnegative integers, п, and 7, we define 
(2.8') Haa, = EOS — EDG](r, — ELX)": 
as a central moment of order п, + Ny, 
in the central moments of orders | an 
Hoa Of order 1 both vanish, whereas 2,9 and fop are, respectively, the 
variances of Y, and Ху. The central moment л is called the covariance 
of the random variables X and X, and is written Coy [Xi Xe]; in symbols, 


We are again particularly interested 
d 2. The central moments иу and 


(2.9) Cov [Х,, х = laa = E(X — EXD, — E[X;])]. 


We leave it to the reader to prove that the 
moment, minus the product of the 


(2.10) 


covariance is equal to the product 
means; in symbols, 


Cov iN, X] = EX, YA — EX). 
The covariance derives it 
formula for the variance of 


(2.11) 


5 importance from the role it plays in the basic 
the sum of two random variables: 


Var[X, + Xj] = Var [X] + Var [ 


Xa] + 2 Cov [X,, Xa] 
To prove (2.11), we write 


Var [X, + Xj] = E(X, + х) — 


FX, + Xj 
= EUG] — EY 


i + ED] — EX] 


+ XEUX,X]— ЕХЕ), 
from which (2.11) follows by (1.8) and (2.10). 


SEC. 2 EXPECTATIONS OF JOINTLY DISTRIBUTED RANDOM VARIABLES 357 


The joint moment-generating function is defined for any two real 

numbers, г; and ży, by 
Vx hs ta) = Ее]. 
The moments can be read off from the power-series expansion of the 
moment-generating function, since formally 
© ao jt ыз >] 
(212 м. vil e oii) к= 1.2 EAS, 
) Е A » P „2, n! na! 


In particular, the means, variances, and covariance of X, and Pe may be 

expressed in terms of the derivatives of the moment-generating function: 
a _ д 

(2.13) Е[Х\] = E, Vx,x (0. 0), E[X3] = En Vx, x, 0). 


2 


2 * д 
Quà х= эз wx. АХ = 55 Prr O. 
х Я 


C15 вух = з у va 0 
ът 


Q16 varxj- La —— n 
1 


2 


o? 
Var [Xd = 575 V, mas o 00) 


o? 
Q.17) Cov [Xp X] = Эг oh Vx, тх. т. 0), 
1 012 


in Which m, = E[ X]. m = E[X;]. 


P» Example 2A. The joint moment-generating function and covariance 
Of jointly normal random variables. Let X, and X, be jointly normally 
distributed random variables with a joint probability density function 


(2.1 1 | 1 [Е = ay 
18 „ый = — m EX E r 
) fy, ens =) maV =Й P 2(1 — р?) ГА 


a — т\ [22 — w) k = m) ] s 
- (2 )( Oz Ti 95 


The joint moment-generating function is given by 


Q.19) Vx. x. (ty be) = 14 i: ehh fy x (ti ar) dr drg. 
к. жы © -0 


358 EXPECTATION OF A RANDOM VARIABLE cH. 8 
To evaluate the integral in (2.19), let us note that since 
uj — 2puyus + u? = (1— p?)uy? + (u — ри)? 


we may write 


1 ау — т. 1 
(220) frx =) = z Te J 


9i a, V] — p? 


ж — My — (as[o,)p(x, — m) 
x pam la — т) > , 
GV 1 — р? 


in which ¢(u) = —— e-'** is the normal density function. Using our 
VIr 
knowledge of the moment-generating function of a normal law, we may 


perform the integration with respect to the variable æ, in the integral in 
(2.19). We thus determine that Vy, ai t2) is equal to 


= ] z-—m 2 
ez) fal. —"") exp tix) exp ъв 2 pe -n)]i 


1 


X exp [to — p?)] 


" 03 
= exp Е — p) + tm, — ty 2 pm] 
d 


Gs ! сг RN 
X exp Ins +1 p) + T +5 p) | 
ч ор | \ 0 
By combining terms in (2.21), we finally obtain that 
(2.22) Pax his ta) = exp [tym + tama + 02 + 2payoatyte + 12057)]. 
The covariance is given by 


д 
(2:23) Gov[X, X] = T m, ang) 
» Xy] дг, дг, е Pr a4, ty) 
Thus, if two random variah 
probability law is complete 
second moments, 
Var [X7], роџоз 


= p3,0» 
i ША 0 
les are Jointly normally distributed, their joint 


ly determined from a knowledge of their first and 
Since m, = E[Xj]. т» = Е[Х], оү? = Var [Xj]. в? = 
= Cov [Х\, X]. "^s 4 


The foregoing notions may be extended to the case of n jointly distri- 
buted random variables. AX... X. Bor any Borel function 
Bis eese vd 7,) Of n real variables, the expectation Elg(X,, Xa- -> XW 


of the random variable ВО Хаа, X,) тау be expressed in terms of 
the joint probability law of Ху. M X 


try 


SEC. 2 EXPECTATIONS OF JOINTLY DISTRIBUTED RANDOM VARIABLES 359 


IE Жаны X Ie jointly continuous, with a joint probability 
density function MEER CE EE x,), it may be shown that 
j | SQ tat m) 
"n 


(2.24) E[g(X, Xy ---, Х,)] = | | 


X fx xoa e) dida s dz. 
If X, Х„,..., Y, are jointly discrete, with a joint probability mass 
function PX Ng xv Gem +- -> Vn), it may be shown that 


(2.25) E[e(X, Y, X] = 
gne PPX, Xa woe, Ny p 1» tts x) 
тс, Tp) Such that 
pee, хын" "y dg) =O 
The joint moment-generating function of n jointly distributed random 
Variables is defined by 
X Hla Xat tb 
(226) р...) = Eje aTa, 
It may also be proved that if Xj, Xs... X, and Y are random 
Variables, such that Y = g,(Xi. XY» <--> Xn) for some Borel function 
B a, ..., v,) of n real variables, then for any Borel function g(-) of 


One real variable 
(2.27) E[g( Y)) = E[g(g 8s X 7^7: X. 


THEORETICAL EXERCISES 


2.1, Linearity property of the expectation operation. Let X; and №, be jointly 
discrete random variables with finite means. Show that (2.7) holds. 


22. Let X, and №, be jointly distributed random variables whose joint moment- 
generating function has a logarithm given by 


du | j dy fy) {erth yen s 12103001 1} 
-* 


= 


(2.28) log Vx, xs i= 1) 

-o 
in which Y is a random variable with probability density function fy(), 
WC) and WC) are known functions, and v > 0. Show that 
< Ии) du, E[X;] = vE[Y] Wu) du, 
-æ 


E[X] = "ЕІ nj | 


^ oo 


(2.29) Var [Xi] = vEL raf : Wu) du, 


Var [Xs] = "El raj Wa (u) du, 


pæ 


Соу, XJ = 580 | WU) Wao) du. 
ч — o 


360 


2.3. 


2.1 


2.2. 


2.3. 


2.4. 


2.5, 


2.6. 


EXPECTATION OF А RANDOM VARIABLE CH. 8 


i i PE lay an important 

-generating functions of the form of (2.28) play importat 

Tolo in the sos traria theory of the phenomenon of shot noise in radio 
tubes. 


The random telegraph signal. For г > 0 let X(r) = U(—1)*, where d 
is a discrete random variable such that P[U = 1] = P[U = or 
{N(t), t > 0} is a family of random variables such that N(0) = 0, and for 
any times fy < tə the random variables U, N(t,), and №) — Ма) are 
independent. For апу f} < fə, suppose that Уа») — №) obeys оа 
Poisson probability law with parameter 2 = v(r — гу), (ii) a id aep 
probability law with parameters p and n = (t; — tı). Show that ELY(0] = 
for any t > 0, and for апуг 20,7 -0 


(2.30) E[X (n X(t + т)] = е-?'т Poisson case, 


=@—ру binomial case. 


Regarded as a random function of time, Х(/) is called a “random telegraph 
signal." Моге: in the binomial case, / takes only integer values. 


EXERCISES 


An ordered sample of size 5 is drawn without replacement from an urn 
containing 8 white balls and 4 black balls. For j = 1,2,...,5 let X; be 
equal to 1 or 0, depending on whether the ball drawn on the jth draw 1s 
white or black, Find ЕГА), 9*LX,], Cov LX, X], Cov Urs, Хз]. 


An urn contains 12 balls, 
drawn and its color noted 
time 2 balls of the same c 


of which 8 are white and 4 are black. A ball is 
+ The ball drawn is then replaced; at the same 
olor as the ball drawn are added to the urn. The 
Process is repeated until 5 balls have been drawn. For j =1,2,.. EE 
let X; be equal to 1 or 0, depending on whether the ball drawn on the jth 
draw is white or black, Find ELX), LX], Соу [X,, X3. 

Let X, and X, be the coordinates of 2 points randomly chosen ori the unit 
interval. Let Y=|x, — Х be the distance between the points. Find the 
mean, variance, and third and fourth moments of y. 

Let X, and X, be independent normally identically distributed random 
variables, with mean m and Variance о. Find the mean of the random 


variable Y = max (Ху, X,). Hint: for апу real numbers жу and ж» show 
and use the fact that 2 max (2, 25) = Ja, — Xj ox + 25, 


Let X, and X, be jointly normally distributed with mean 0, variance 1, 
and covariance p. Find E[max (X;, Xə). 


Let X, and X, have a joint moment-generating function 


PX XG, 05) = a(ehts 4 1) + Ble + els) 
in which a and b are 


positive constants such that 2a 2b — 1. Find 
EIX], EUG), Var DX], Var [X;], Соу [Ж X; 


, 


SEC. 3 UNCORRELATED AND INDEPENDENT RANDOM VARIABLES 361 


2.7. Let X, and X, have a joint moment-generating function 
| Wy allay ta) = [абе + 1) + dle + e) 
in which a and b are positive constants such that 2a + 2b —1. Find 
ELX;), ELX;], Var [Xi], Var [X], Cov Xy, А]. 

2.8. Let X, and X, be jointly distributed random variables whose joint moment- 
generating function has a logarithm given by (2.28), with » = 4, Y uniformly 
distributed over the interval —1 to 1, and 

Wi) =e), иа, ИШ) =e), Bay 
=0, и < äp =0, и < ds. 
in which ау, a, are given constants such that 0 <a, <a. Find Е(Х)}, 
ELY), Var LY], Var [АХ], Cov [АХ\, А). 


29. Do exercise 2.8 under the assumption that Y is N(1, 2). 


3. UNCORRELATED AND INDEPENDENT 
RANDOM VARIABLES 


The notion of independence of two random variables, X; and X, is 
defined in section 6 of Chapter 7. In this section we show how the notion 
9f independence may be formulated in terms of expectations. At the 
Same time, by a modification of the condition for independence of random 
Variables, we are led to the notion of uncorrelated random variables. 

We begin by considering the properties of expectations of products of 
random variables. Let Y; and XX, be jointly distributed random variables. 
s the linearity properties of the operation of taking expectations, it 

Ollows that for any two functions, 810. . -) and 2C > .), 

e» E[g(X3, Xo) + gXss хы] = Elgi ХЫ] + Е, Хы] 
den expectations on the right side of (3.1) exist. However, it is nof true 
ine à similar relation holds for products; namely, it is nor true in general 
X E[g Xi, ХХ, ХЫ] = Ep CX. X9]ELgs Qs. Xə]. There is one 
Pecial circumstance in which a relation similar to the foregoing is valid, 
"eid if the random variables X, and XX, are independent and if the 
Unctions are functions of one variable only. More precisely, we have the 


following theorem: 


nm 3a: If the random variables 
any two Borel functions gy) and gx) 


mon à 
Symbols, of BX) and (АХ) is equal to the pr 


(3.2) Ele (gX) = Els LEG OI 


if t : 2 
he expectations on the right side of (3.2) exist. 


X, and X, are independent, then 
of one real variable the product 
roduct of their means; in 


362 EXPECTATION OF A RANDOM VARIABLE cH. 8 
To prove equation (3.2), it suffices to prove it in the form 
(3.3) АҮ = ELYJELY2] if Y, and Y, are independent, 


since independence of X, and X, implies independence of g(X,) and (ХУ). 
We write out the proof of (3.3) only for the case of jointly continuous 
random variables. We have 


ELY  Y;] =| | І Vile fy y Gs Jo) diy dya 


=| | Ye Sy Gf y Ye) diy, dyz 


=|" A fyr h) dnf і dya fy (Yo) dy; = Е Y JEL Yə]. 


Now suppose that we modify (3.2) and ask only that it hold for the 
functions g,(z) = ж and g,(x) = 2, so that 


(3.4) ЕХ, = Е[Х]Е[Х\]. 


For reasons that are explained after (3.7), two random variables, X, and 
Xs, which satisfy (3.4), are said to be uncorrelated. From (2.10) it follows 
that X, and X, satisfy (3.4) and therefore are uncorrelated if and only if 
(3.5) Cov [X, Xo] = 0. 

For uncorrelated random variables the formula given by (2.11) for the 
variance of the sum of two random variables becomes particularly elegant; 
the variance of the sum of two uncorrelated random variables is equal to 
the sum of their variances. Indeed, 


(3.6) Var [Xq + X] = Var [X] + Var [X;] 


if and only if X, and X, are uncorrelated, " 
Two random variables that are independent are uncorrelated, for if 
(3.2) holds then, a fortiori, (3.4) holds. The converse is not true in 
general; an example of two uncorrelated random variables that are not 
independent is given in theoretical exercise 3.2. In the important special 
case in which X, and X are jointly normally distributed, it follows that 
they are independent if they are uncorrelated (see theoretical exercise 3.3). 


The correlation coefficient p(. X,, X3) of two jointly distributed random 
variables with finite positive variances is defined by 


(3.7) p, X) = ane. 
1 2. 


SEC. 3 UNCORRELATED AND INDEPENDENT RANDOM VARIABLES 363 


In view of (3.7) and (3.5), two random variables X; and X, are uncorrelated 
if and only if their correlation coeflicient is zero. ww 

The correlation coefficient provides a measure of how good a prediction 
of the value of one of the random variables can be formed on the basis of 
an observed value of the other. It is subsequently shown that 


An le(Xs. X9I < 1. 
Further p(X,, X;) = 1 if and only if 
X, — EIX) № EIX] 


3, 2 
| ш ох] ва! 
апа р(х, X,) = —1 if and only if 

X,—E[X] — X;— ЕХ] 
(3.10) " = A 


From (3.9) and (3.10) it follows that if the correlation coeflicient equals 
1 or —1 then there is perfect prediction; to a given value of one of the 
random variables there is one and only one value that the other random 
Variable can assume, What is even more striking is that Р(Х. х) = +1 
if and only if y, (, are linearly dependent. О. 

That Den LE NI sd Bien from the following important 
theorem, 


THEOREM 3m. For any two jointly distributed random variables, X, and 


Les With finite second moments 
6.11) E*LY, X,] = EU, ХУ]? < EXPEL]. 
Further, equality holds in (3.11), that is, E*[Xs X4 = EXPEL] if and 
only if, for some constant t, Xa = tX;, which means that the probability 
Mass distributed over the (x, )-plane by the joint probability law of the 
random variables is situated on the line z» = t^. 

Applied to the random variables X, — ELV] and № — АХ), (3.11) 
States that 


(3.12) [Cov LY, XI? < Var [X,] Var [Xe], [Cov Ds. Xy < of X Jol Xa) 


We prove ollows. Define, for any real number /, A(t) = 
lex, ao a Ro — ЕХ) + ELX} Clearly h(t) 2:0 for all t. 

Onsequently, the quadratic equation h(t) = 0 has either no —— " 
Опе solution. The equation h(t) = 0 has no solutions if and only x 
EU x — E[X,2]E[X,?] < 0. It has exactly one solution if and wh 
ED xg = ELS] ELX;?]. From these facts one may immediately infer 
(3.11) and the sentence following it. 


364 EXPECTATION OF A RANDOM VARIABLE cH. 8 


The inequalities given by (3.11) and (3.12) are usually referred to as 
гаг? inequality or Cauchy's inequality. - 
d ps Independence. It is important to note the difference 
between two random variables being independent and being uncorrelated. 
They are uncorrelated if and only if (3.4) holds. It may be shown that 
they are independent if and only if (3.2) holds for all functions g,(-) and 


gC), for which the expectations in (3.2) exist. More generally, theorem 
3c can be proved. 


THEOREM 3c. Two jointly distributed random variables X, and X, are 


independent if and only if each of the following equivalent statements is 
true: 


(i) Criterion in terms of probability functions. For any Borel sets By and 
В, of real numbers, PLY, is in B,, X, is in Bj] = P[X, is in B,]P[Y, is inj]. 
(ii) Criterion in terms of distribution functions. 
numbers, x, and 2, F, x, x 5) = Fy (5)F y (n9). А 
(iii) Criterion in terms of expectations. For any two Borel functions, 


810) and g,(-), Elg(X4)go(X9)] = F[g X) E[g (X;)] if the expectations 
involved exist. 


(iv) Criterion in terms of moment 
any two real numbers, t and t5, 


(3.13) 


For any two real 


-generating functions (if they exist). For 


Payal t) = Elit = ye (y (1), 


THEORETICAL EXERCISES 


3.1. The standard deviation has the Properties of the operation of taking the 
absolute value of a number: show first that for any 2 real numbers, 2 us 
y, |e + yl < lel + pl | fo] — Ш <le —›]. Hint: Square both sides О 

the equations. Show next that for any 2 random variables, X and Y, 

(14) ox + Y] = AX] + o[Y], loy] — У] < ох — Y]. 

Give an example to prove that the Variance does not satisfy similar 

relationships. 

3.2. Show that independent random variables 
example to show tha 
Y — cos 2zU, in whic 


аге uncorrelated. Give m 
t the converse is false, Ийи: Let X = sin 27 n 
h U is uniformly distributed over the interval 0 to 1. 


3.3. Prove that if X, and y, аге jointly normally distributed random variables 
whose correlation Coefficient vanishes t 


hen X, and X, are independent. 
Hint: Use example 2A, 1 and X, ari P 


3.4. Let x and f be the values of a and Ь Which minimize 


f(a, b) = E| X, — a — by p, 


SEC. 3 UNCORRELATED AND INDEPENDENT RANDOM VARIABLES 365 


3.5. 
3.6. 


3.7, 


3.9, 


3.1 


3.2, 


Express а, В, and f(z, P) in terms of (Ху, Хз). The random variable 
® + BX, is called the best linear predictor of Хз, given X; [see Section 7, 
in particular, (7.13) and (7.14)]. 

Prove that (3.9) and (3.10) hold under the conditions stated. 

Let X, and X, be jointly distributed random variables possessing finite 
second moments. State conditions under which it is possible to find 2 
uncorrelated random variables, Y, and >, which are linear combinations 
of X, and X, (that is, Y, = ai X + diss and Y, = ag X, + ass Xs for 
some constants ауу, йу», азу, 22 and Cov [Yy, Ya] 0). 

Let X and Y be jointly normally distributed with mean 0, arbitrary 
variances, and correlation р. Show that 


PLY > 0, Y 20] = PLY < 0, Y <0) E + = sin. 
1 


т 


-— [= 


PLY <0, Y 20] = P[X 2 0, Y <0) = 3 — 5: sin р. 


Hint: Consult Н. Cramér, Mathematical Methods of Statistics, Princeton 
University Press, 1946, p. 290. 
Suppose that л tickets bear arbitrary numbers 21, 2», ...› ©» which are 
not all the same. Suppose further that 2 of the tickets are selected at 
random without replacement. Show that the correlation coefficient p 
between the numbers appearing on the 2 tickets is equal to (—1)/(л — 1). 
roportion p is white and q —1 — p 
are black. A ball is drawn and its color noted. The ball drawn is then 
replaced, and Nr balls are added of the same color as the ball drawn. The 
process is repeated until п balls have been drawn. For j —1,2,...." 
let X; be equal to 1 or 0, depending on whether the ball drawn on the jth 
draw is white or black. Show that the correlation coefficient between X; 
and Y; is equal to а +r). Note that the сазе” = —1/N corresponds to 
sampling without replacement, and r = 0 corresponds to sampling with 


replacement. 


In an urn containing N balls, a p 


EXERCISES 


Consider 2 events А and B such that P[4] = 4, PIB | А] = 1, P[A| В] = 1. 


Define random variables X and Y: X = 1 or 0, depending on whether 
= 1 or 0, depending on whether 


the event А has or has not occurred, and Y 
the event B has or has not occurred. Find EX], BLY}, Var [X], Var pa 
p(X, Y). Are X апа Y independent? 

Consider a sample of size 2 drawn with replacement (without replacement) 
from an urn containing 4 balls, numbered 1 to 4. Let X; be the smallest 
and X, be the largest among the numbers drawn in the sample. Find 
Р(Х, Хз). 
Two fair coins, each with faces numbered 1 and 2, are thrown independ- 
ently. Let X denote the sum of the 2 numbers obtained, and let Y 
denote the maximum of the numbers obtained. Find the correlation 


Coefficient between X and Y. 


366 EXPECTATION OF A RANDOM VARIABLE cH. 8 


3.4. Let U, V, and W be uncorrelated random variables with equal variances. 
2 Let X =U + V, Y = U + W. Find the correlation coefficient between 
X and Y. 


3.5. Let X, and X, be uncorrelated random variables. Find the correlation 
p( Үү, У) between the random variables Y, = X, + X; and Y, = X, — Xs 
in terms of the variances of X; and Х,. 


3.6. Let X, and X, be uncorrelated normally distributed random variables. 
Find the correlation р( Y, У) between the random variables Y, = Xj? 
and Y, = Х,2. 


3.7. Consider the random variables whose joint moment-generating function 
is given in exercise 2.6. Find Р(Х}, Xə). 

3.8. Consider the random variables whose 
is given in exercise 2.7. Find p(X,, X;) 

3.9. Consider the random variables whose 
is given in exercise 2.8. Find p(. X, Xə) 


3.10. Consider the random variables whose 
is given in exercise 2.9. Find РОХ, X) 


joint moment-generating function 


joint moment-generating function 


joint moment-generating function 


4. EXPECTATIONS OF SUMS OF RANDOM VARIABLES 


Random variables, which arise as, or may be represented as, sums of 
other random variables, play an important role in probability theory. 
In this section we obtain formulas for the mean, mean square, variance, 
and moment-generating function of a sum of random variables. 

Let Xy, Xs, ..., X, be n jointly distributed random variables. Using 
the linearity properties of the expecration operation, we immediately 


obtain the following formulas for the mean, mean square, and variance 
of the sum: 


(4.1) z| Èx] =} E[X); 


di [(3.%)"] = È Eixe + 25 5 клх: 


(4.3) Var РЯ =2 Var [X] +2 > > Cov [Xp Xj. 
ss k=1j=k+1 


Equations (4.2) and (4.3) follow from the facts 


n $ m» ж x gii ^ 
(4.4) (2х) = 2 Хх, = (Xxx, +хг+ Ў xx; з 
| ae E V ) 


ј= +1 
n k—1 n n n n 

4.5 XX, = ‚= 

ae) à p SM AX 6G ll XX. 


K=1j=k+1 


SEC. 4 EXPECTATIONS OF SUMS OF RANDOM VARIABLES 367 


Equation (4.3) simplifies considerably if the random variables Xj, Xs. 

., X, are uncorrelated (by which is meant that Cov [Xp Xj] = 0 for 
every k = /). Then the variance of the sum of the random variables is 
equal to the sum of the variances of the random variables; in symbols, 


n n 
(4.6) Var | > x] = Y Var [X;] if Cov [X,, X] = 0 fork =]. 
KA Z1 

Е" the random variables Ху, Xs, .... X, are independent, then we may 
give a formula for the moment-generating function of their sum; for any 
real number / 


@л) O 0р0 Vx, O 


In words, the moment-generating function of the sum of independent random 
variables is equal to the product of their moment-generating functions. The 
Importance of the moment-generating function in probability theory 
derives as much from the fact that (4.7) holds as from the fact that the 
Moment-generating function may be used to compute moments. The 
Proof of (4.7) follows immediately, once we rewrite (4.7) explicitly in 
terms of expectations: 


(4.7') E[e v7 +d = Efe] +++ Efe]. 


Фк Хуже 


and variance of a random 
of Y) if one can represent 
X,, the mean, variances, 


These results are useful for finding the mean 
Variable Y (without knowing the probability law 
Y as а sum of random variables Xj, Xs - - +> 
and covariances of which are known. 


» Example 4A. A binomial random variable as a sum. The number of 
Successes in л independent repeated Bernoulli trials with probability p of 
Success at each trial is a random variable. Let us denote it by S,. It has 
been shown that S, obeys a binomial probability law with parameters 7 


and y; Consequently, 
(8) r[s]—ap,  Var(S,]= e — "s (07 (rel + 9)". 


We now show that (4.8) is an immediate consequence of (4.1), (4.6), and 


(4.7). Define random variables Xy X». - + «> Xa by № = 1 or 0, depending 
Sn Whether the outcome of the kth trial is a success or a failure. One may 
Verify that (i) s, = X, + X9 sob Аһ; (ii) Xy, ..., X, are inde- 
Pendent random variables; (iii) for k = 1,2,---54 X, is a Bernoulli 
random variable, with mean E[X,] = р» variance Var [X] = рф, and 
Moment-generating function yy, () = ре +q. The desired conclusion 
May now be inferred. 


368 EXPECTATION OF A RANDOM VARIABLE сн. 8 


р Example 4B. A hypergeometric random variable asasum. The number 
of white balls drawn in a sample of size п drawn without replacement from 
an urn containing N balls, of which а = Np are white, is a random 
variable. Let us denote it by S,. It has been shown that S, obeys a 
hypergeometric probability law. Consequently, 


N—n 
(4.9) E[S,] = np, Var[S,] = npq УШ 


We now show that (4.9) can be derived by means of (4.1) and (4.3), 
without knowing the probability law of S,. Define random variables 
Kip Aas ion Ags X1 ог, depending on whether a white ball is or 
is not drawn on the Ath draw. Verify that (i) 5, = Xp XX +... + Xn 
Gi) for eo 1,21... уи, X, is a Bernoulli random variable, with mean 
ЕХ] = pand Var [X;] = pq. However, the random variables Missa svi 
are not independent, and we need to compute their product moments 
E[X;X,] and covariances Cov [X;, Ху] for any j#k. Now, E[X;X,] = 


Р[Х = 1, X, = 1], so that E[X;X,] is equal to the probability that the 
balls drawn on the jth and kth dr 


aws are both white, which is equal to 
[a(a — Yy(N(N — 1). Therefore, 


-) , -m 
Cov[X, X = EU X] — EIX JELY] = A — 5 Peut 


Consequently, 
= — 1 
Var [S,] = = wet) = а=). 
r[S,] = npq + n(n n( at npq\ 1 Wo 
The desired conclusions may now be inferred. < 


P» Example 4C. The number of occupied urns as a sum. If 7 distinguishable 
balls are distributed into 


M distinguishable urns in such a way that each 
ball is equally tikely to go into any urn, what is the expected number of 
occupied urns? 

Solution: For k = 1,2,..., M let X, = 1 or 0, depending on whether 
the kth urn is or is not Occupied. Then S = X, + X, 4... + Ху is 
the number of occupied urns, and E[S] the expected number of occupied 
urns. The probability that a gi will be occupied is equal to 
1 — [1 — (1/M)]*. Therefore, E[,] = 1 — [O — (1/M)]" апа E[S] = 
MU — [1 — jy, 4 


THEORETICAL EXERCISES 


4.1. Waiting times in coupon collecting. Assume that each pack of cigarettes of 
а certain brand contai 
distributed among th 


SEC. 4 EXPECTATIONS OF SUMS OF RANDOM VARIABLES 369 


4.2, 


4.3, 


available is infinite). Let Sy be the minimum number of packs that must 
be purchased in order to obtain a complete set of N cards. Show that 


N 
E[Sy] = NE (1), which may be evaluated by using the formula (see Н. 
ea 


Cramér, Mathematical Methods of Statistics, Princeton University Press, 
1946, p. 125) 


" 
S 1 
2, g = 057722 + log, N + zs + Ry, 


in which 0 < Ry <1/8N*. Verify that E[S;,] = 236 if N = 52. Hint: 
For k 20,1,..., N — 1 let Y, be the number of packs that must be 
purchased after & distinct cards have been collected in order to collect the 
(k + Dst distinct card. Show that E[X;] = N/(N — k) by using the fact 
that Y; has a geometric distribution. 

Continuation of (4.1). For r = 1,2, ..., N let S, be the minimum number 


d packs that must be purchased in order to obtain r different cards. Show 
at 


1 1 1 1 
FSA = N(S + yi Tae" +1) 


Var [S] = І 2 Ае ride 
25 П Ner ta Weng 


Show that approximately (for large N) 
N 
E[S] = №08 т: 
Show further that the moment-generating function of ,5, is given by 


-1 (N — Юе! 
vs) "П (М — ке?) è 


Continuation of (4.1). For r preassigned cards let T, be the minimum 
number of packs that must be purchased in order to obtain all r cards. 
Show that 
y N r NN-—r-4K.—1) 
E] = Um a = жыт Рен ESSE сч ee 
2, г=к+1? aed à (=k ED 


* The mean and variance of the number of matches. Let S; be the number of 


urn, M balls, numbered 1 to M, 
s shown in theoretical exercise 
Show this, using the 


matches obtained by distributing, 1 to an 
among M urns, numbered 1 to M. It wa 
3.3 of Chapter 5 that E[S,;] = 1 and Var [Sy] = 1. i 
fact that Sy =X +.. ‚+ Хур in which X, = 1 or 0, depending on 
Whether the kth urn does or does not contain ball number k. Hint: Show 
that Coy LY; X] = (М — D/M? or 1/M*(M. — 1), depending on whether 


J = korj x К, 


370 
4.5. 


4.6. 


4.1. 


4.2. 


4.3. 


4.4. 


‚8 
EXPECTATION OF A RANDOM VARIABLE CH 


i i andom variables with zero means 

at if X,,..., X, are independent ran - 

аар moments, then the third and fourth moments of the sum 
S, = X, +... + X, are given by 


s EG n i n ГА P n ELS]. 
HAA-ISXA EIS] = È ELGA + 6 > Ера) Y 


j=k+1 


If the random variables X;,..., X, are independent and identically 
distributed as a random variable X, then 


Е[5,8] = nE[X*), Е[5,1] = nEUX?] + 3n(« — DES]. 


Let Xi, Xs, ..., X, be a random sample of a random variable X. Define 
the sample mean XX and the sample variance 52 by 


(i) Show that E[S?] = о°, Var [5°] = (o4/n)[(n 5/0") — (n — З/п — 10), in 
which c? = Var [X], Hy = E(X — ELX]!]. Hint: show that 


Yao, - ЕХ)? = У(Х. — Xy + a(X — ELX). 
k=1 kel 


" = = 
(ii) Show that p(X, — Y, Y, — y) = = 


ү for i ғ j. 


EXERCISES 


Let X, X,, and Y, be inde 


a ariables 
pendent normally distributed random variables, 
each with mean | and vari 


ance 3. Find P[Y, + Y, + Y, > 0]. 

Consider a sequence of independent repeated Bernoulli trials in which the 
probability Of success on any trial is p = ў, 

(i) Let S, be the number of tri 
E[S,] and Var [S,]. Hint: 
which JY, is the number of 
The random variables x, 
buted. 


(ii) Let 7, be the num 


is 
nber of failures encountered before the nth success i 
achieved. Find ЕТ, and Var (7,]. 


als required to achieve the nth success. n 
Write S, as a sum, $, = X, +... + Хь м 
trials between the k — Ist and Ath Баран 
T X, are independent and identically dis! 


A fair coin is tossed n limes. Let T, b 
that a tail is followed byahead. Show 
(n — 0/4 + [n = 2) — 3)]16. Fin 


€ the number of times in the л ae 
that E[T,] = (n — 1)/4.and ЕТТ, 
d Var [7,]. 


ys wants to open his door. 
and at radom. Let N, be the number of t 
Find E[N,] and Var [N,] if (i) unsuccessfu 
further selections, (ii) if they 
open the door. 


A man with п Ке He tries the keys independently 
rials required to open the joi 
l keys are not eliminated je 
аге. Assume that exactly one of the keys C? 


S 
EC. 5 THE CENTRAL LIMIT THEOREM 371 


In exercises 4.5 and 4.6 consider an iten i i 
ercises 4.5 and 4. an item of equipment that is composed by 
bling in a straight line 4 components of lengths X, Xo, Хз, and Хз, respec- 
ely. Let Е[А\] = 20, E[X;] = 30, Е[Х»] = 40, E[.X,] = 60. 


4.5. Assume Var [Xj] = 4 forj = 1,...,4. 
(i) Find the mean and variance of the length L = X, + Xs + Xs + X, 
of the item if Xy, Xj, Ху, and X, are uncorrelated. 
(ii) Find the mean and variance of L if p(X; X;,) = 0.2 for 1 <j <k x4. 

4.6. " " 

6 e that o[.Y;] = (0.1) E[X)] for = 1,..., 4. Find the ratio E[L]/o[L], 

a led the measurement signal-to-noise ratio of the length L (see section 6), 
Or both cases considered in exercise 4.5. 


5. THE LAW OF LARGE NUMBERS AND THE 
CENTRAL LIMIT THEOREM 


T rion applications of probability theory to real phenomena two results 
a theory of probability play a conspicuous role. These 
vei аге known as the law of large numbers and central limit theorem. 
in ы point in this book we have sufficient mathematical tools available 
the ria to apply these basic results. In Chapters 9 and 10 we develop 
Bie litional mathematical tools required to prove these theorems with 
А Iclent degree of generality. 
es cw of n observations Xj, Xp, «es 
Хён ofa random variable X if Xy, Xs . . + 
es, identically distributed as X. Let 


5, я 
“D eee 4 


be t " > 
he sum of the observations. Their arithmetic mean 


, X, are said to constitute a random 
, X, are independent random 


$3) М„= ia 
n n n 


s рне the sample mean. 
ed (4.1), (4.6), and (4.7), we obtain t | | 
of the Variance, and moment-generating function of S, and M,, in terms 
mean, variance, and moment-generating function of ¥ (assuming 


t ese exist): 
5; 
(5.3) Е[5,] = nE[X], Var[S,] = п Var [X]. vs) = [р х(0)]". 


he following expressions for the 


5 / n 
бм) E[M,] = E[X],  Var(M,] = + Var [X]. ya) — [Ж] . 


Е " 
bg (5.4) we obtain the striking fact that the varianc&'of the sample 
n (1/n)S,, tends to 0 as the sample size п tends to infinity. Now, by 


e y 2 : 
Yshev's inequality, it follows that if a random variable has a small 


372 EXPECTATION OF A RANDOM VARIABLE cH. 8 


variance then it is approximately equal to its mean, in the sense that 
with probability close to 1 an observation of the random variable will yield 
an observed value approximately equal to the mean of the random variable; 
in particular, the probability is 0.99 that an observed value of the random 
variable is within 10 standard deviations of the mean of the random 
variable. We have thus established that the sample mean of a random 
sample Ху, X5, ..., X, ofa random variable, with a probability that can be 
made as close to | as desired by taking a large enough sample, is approxi- 
mately equal to the ensemble mean E[X]. This fact, known as the /aw of 
large numbers, was first established by Bernoulli in 1713 (see section 5 
of Chapter 5). The validity of the law of large numbers is the mathe- 
matical expression of the fact that increasingly accurate measurements of 
a quantity (such as the length of a rod) are obtained by averaging an 
increasingly large number of observations of the value of the quantity. 
A precise mathematical statement and proof of the law of large numbers 
is given in Chapter 10. 

However, even more can be proved about the sample mean than that it 
tends to be equal to the mean. One can approximately evaluate, for any 
interval about the mean, the probability that the sample mean will have ап 
observed value in that interval, since the sample mean is approximately 
normally distributed. More generally, it may be shown that if Sp is the 
sum of independent identically distributed random variables X. з gy ease Хь, 
with finite means and variances then, for any real numbers a < b 


(5.5) Pla<s,<s)= Je EIS, — Sa — EIS,] = 
5] ^" [S] 2 iS 
=o CES) — SM | 
0[5,] 0[5,] 

In words, (5.5) may be expressed as follows: the sum of a large number 
of independent identically distributed random variables with finite means 
and variances, normalized to have mean zero and variance 1, is approx" 
mately normally distributed. Equation (5.5) represents a rough statement 


of one of the most important theorems of probability theory. In 1920 
d polya gave this theorem the name “the central limit theorem of 
probability theory.” 


r This name continues to be used today, although 2 
more apt description would be “the normal convergence theorem.’ 
The central limit theorem was first proved by De Moivre irt 1733 for the 
case in which A, Xs... , X, are Bernoulli random variables, so that Sn 
is then a bingmial random variable. A proof of (5.5) in this case (with 2 
continuity correction) was given in Section 2 of Chapter 6. The deter- 
mination of the exact conditions for the validity of (5.5) constituted the 
outstanding problem of probability theory from its beginning until the 


SEC. 5 THE CENTRAL LIMIT THEOREM 373 


decade of the 1930s. A precise mathematical statement and proof of the 
central limit theorem is given in Chapter 10. 

It may be of interest to outline the basic idea of the proof of (5.5), even 
though the mathematical tools are not at hand to justify the statements 
made. To prove (5.5) it suffices to prove that the moment-generating 
function 

t n 
(5.6) w(t) = Epg Ss EIS DISSI] = Bp g (эе 
v) [ ] bii Е[Х] Vna X] 
satisfies for t in a neighborhood of 0 
2 
(3.7) lim log p,() = 5> 


n=% 


in which 72/2 is the logarithm of the moment-generating function ofa 
random variable Y, which is N(0, 1). Now, expanding in Taylor series, 


(5.8) тавд. = 1 89012 + 00, 
Where the remainder A(u) satisfies the condition lim A(u)/u? = 0. 


и—! 


Similarly, log (1 + v) = v + B(r) where us B(v)/v = 0. Consequently 


Опе may show that for values of и sufficiently close to 0 


m) log yy axo) = 20А + C) 
Where 
бло) ња = 

и—0 1 


It then follows that 


$.) | б & af aac =| 
og y,(r) = n log venl Fog xy) 2 Vna X] 
Where 
(5.12) Я І ) 2A) 
c(——_} = 0. 
an ча (Frat) 
From (5.11) and (5.12) one obtains (5.7). Our heuristic outline of the 


Proof of (5.5) is now complete. 


iven any random variable X with finite mean and variance, we define 


us Standardization, denoted by X*, as the random variable 
(5.13) РР 
o[X] 


Th ; i i with mean 
E € Standardization X* is a dimensionless random variable, 


*] = 0 and variance e?[X*] = 1. 


.8 
374 EXPECTATION OF A RANDOM VARIABLE CH 


The central limit theorem of probability theory can now be Lg deem 
The standardization (S,)* of the sum S, of a large number nof indeperia eo 
and identically distributed random variables is approximately nor iis | 
distributed. In Chapter 10 it is shown that this result may be considerably 


extended to include cases in which S, is the sum of dependent nonidentically 
distributed random variables. 


p> Example 5A. Reliability. Evaluation of the reliability of € 
à problem of obvious importance in the space age. By the reliability ofa 
rocket one means the probability p that an attempted launching of the 
yocket will be successful. Suppose that rockets of a certain type have, by 
many tests, been established as 90% reliable. Suppose that a аста 
of the rocket design is being considered. Which of the following sets o 
evidence throws more doubt on the hypothesis that the modified €— 
is only 90% reliable: (i) of 100 modified rockets tested, 96 performe! 


satisfactorily, (ii) of 64 modified rockets tested, 62 (equal to 96.9%) 
performed satisfactorily. 


Solution: Let S, be the number of rockets in the group of 100 which 
performed satisfactorily, and let 5» be the number of rockets in the group 


of 64 which performed satisfactorily. If p is the reliability of a rocket, 
then S, and S, have standardizations 


(since S, and S, have binomial 
distributions): 
E — 
(5,)* m" iy (Sj)* m m. 
10V pg 8V pq 


(oe 09, 5, = 96, and 5, = 62, then (S)* = 2 and (S)* = 18. If 
(S)* is N(0, 1), the proba 


bility of observing a value of (S,)* prenter 
than or equal to 2 is 0.023. If (S,)* is N(0, 1), the probability of observing 
a value of (S,)* greater than or equal to 1.83 is 0.034. Consequently, 
ries is better evidence than scoring 62 successes 


hesis that the modified rocket has a higher 
rocket, 


Scoring 96 successes in 100 t 
in 64 tries for the hypotl 
reliability than the original 


p> Example 5B. Brownian motion and random walk. A particle (of 


diameter 10-1 centimeter, say) immersed in a liquid or gas exhibits cease 
less irregular motions that а 


the major successes of stati 
Einstein showed that the Bri 


SEC. 5 THE CENTRAL LIMIT THEOREM 315 


that the particles are subject to the continual bombardment of the mole- 
cules of the surrounding medium. The theoretical results of Einstein were 
Soon confirmed by the exact experimental work of Perrin. To appreciate 
the importance of these events, the reader should be aware that in the 
years around 1900 atoms and molecules were far from being accepted as 
they are today—there were still physicists who did not believe in them. 
After Einstein's work this was possible no longer (see Max Born, Natural 
Philosophy of Cause and Chance, Oxford, 1949, p.63). If we let S, denote 
the displacement after г minutes of a particle in Brownian motion from 
its starting point, Einstein showed that S, has probability density function 


r \! 


(5.14) fa = (x p 


=x Dt 


F which D is a constant, called the diffusion coefficient, which depends on 
пе absolute temperature and friction coefficient of the surrounding 
medi an é $ 

medium. In words, S, is normally distributed with mean 0 and variance 


(5.15) E[S] = 20. 
Ша result given by (5.15) is especially important; it states that the mean 
roe displacement E[S,7] of a particle in Brow nian motion is propor- 
onal to the time г. A model for Brownian motion 15 provided by a 
аи undergoing a random walk. Let Xj, X...» X, be independent 
om variables, identically distributed as a random variable Y, which has 
«T E[X] = 0 and finite variance E[Y?] The sum Sa = Xi TX. эз 
( » Tepresents the displacement from its starting position of a point 
OF particle) performing a random walk on à straight line by taking at the 
s Step a displacement Х,. After steps. the total displacement 5, has 
сап and mean square given by 
(5.16) E[S,] = 0, Е[5,2] = nE[X?]. 
Rus cle undergoing a random 
oe i Proportional to the number of steps 7. Since 5, is approximately 
nally distributed in the sense that (5.5) holds, it might be thought that 
e Probability density function of 5, is approximately given by 


14 
) Cg PP, 


) represents а stronger conclusion 


Th 8 
Us the mean-square displacement of a parti 


r I 
А®= | 


i , 

жу, В = Е[Х?]. However, (5.17 

lit (5.5). Equation (5.17) is a normal convergence theorem for proba- 

Or тешу functions, whereas (5.5) is a normal convergence theorem 

ы istribution functions; (5.17) implies (5.5), but the converse is not 
©. It may be shown that a sufficient condition for the validity of (5.17) 


(5.17) 


bi 


376 


cH. 8 
EXPECTATION OF A RANDOM VARIABLE 


e inte abilit 
is that the random variable X possesses a square integrable e ps 
: ty function. From the fact that S, is л iris dr 

rb 5 it follows that it is very 
i i that (5.5) holds it follo o" 
tributed in the sense ee Li aem 
ЛИ be observed more than 3 or 4 s ierit. 
кезин de i d valk in which the individua 
i om walk in which 
ts mean. Consequently, in a ran "eer e 
cid havo mean 0 it is very unlikely after n steps that the distance fron 4 
ste у: 
origin will be greater than 4o[ X]V n. 


5.1, 


52. 


53, 


54. 


5.5. 


5.6. 


EXERCISES 


the 
Which of the following sets of evidence throws more ue О of 
hypothesis that new born babies are as likely to be boys as girls: 


abies, 510 
10,000 new born babies, 5100 are male; (ii) of 1000 new born babies, 
are male. 


; Я ability 
The game of roulette is described in example 10. Find ше тобо O00 
that the total amount of money lost by a gambling ees ы negative. 
bets made by the public on an odd outcome at roulette wi 


Я e. it 15 
As an estimate of the unknown mean E[X] of a random мапа ue M 
customary to take the sample mean X — (Xi + Xa кеге us large à 
random sample X,, Х,,..., X, of the random variable X. E ae 
sample should one observe if there is to be a probability of at lea 


nore 
that the sample mean X will not differ from the true mean E(X] by т 
than 25% of the standard deviation o[Y]? 


ing a dollar 
А man plays a game in which his probability of winning or losing pies 
is}. Let S, be the man's fortune (that is, the amount he has won 
after л independent plays of the game. 
(i) Find E[S,] and Var [S,]. Hint: Write S. 
Х is the change in the man's fortu 


(ii) Find approximately the prob 


= ХФ... + XS In which 
n=X +... а 
пе оп the ith play of the game. 


ame 
ability that after 10,000 plays of ee 
the change in the man’s fortune will be between —50 and 50 dolla 


Consider a game of chance in which one m 
or 4 dollars; each Possibility has 
this game be played if there is to 


final outcome the average gain 
+2? 


3, 
ay win 10 dollars or lose poe 
probability 0.20. How many times che 
bea probability of at least 95 % t 2 and 
or loss per game will be between 


H В H arit ble 
A certain gambler's daily income (in dollars) is a random varia 
uniformly distributed ove 


r the interval —3 to 3: 
(i) Find approximatel 
play he will have wo 


X 


Я ndent 
y the probability that after 100 days of independe^ а 


n more than 200 dollars, 


SEC. 5 THE CENTRAL LIMIT THEOREM 371 


5.7. 


5.8. 


5.9. 


5.10. 


5.11, 


512, 


(iii) Determine the number of days the gambler can play in order to have 
à probability greater than 95% that his total winnings on these days will 
be less than 180 dollars in absolute value. 


Add 100 real numbers, each of which is rounded off to the nearest integer. 
Assume that each rounding-off error is a random variable uniformly 
distributed between —3 and $ and that the 100 rounding-off errors are 
independent. Find approximately the probability that the error in the sum 
will be between —3 and 3. Find the quantity 4 that the probability is 
approximately 99 % that the error in the sum will be less than A in absolute 
value, 


If each strand in a rope has a breaking strength, with mean 20 pounds and 
standard deviation 2 pounds, and the breaking strength of a rope is the 
sum of the (independent) breaking strengths of all the strands, what is the 
probability that a rope made up of 64 strands will support a weight of 


(i) 1280 pounds, (ii) 1240 pounds. 


A delivery truck carries loaded cartons of items. If the weight of each 
carton is a random variable, with mean 50 pounds and standard deviation 
5 pounds, how many cartons can the truck carry so that the probability 
of the total load exceeding 1 ton will be less than 5%? State any assump- 


tions made. 


Consider light bulbs, produced by a machine, whose life X in hours is a 
random variable obeying an exponential probability law with a mean 
lifetime of 1000 hours. 

(i) Find approximately the probability that a sample of 100 bulbs selected 
at random from the output of the machine will contain between 30 and 
40 bulbs with a lifetime greater than 1020 hours. E 

(ii) Find-approximately the probability that the sum of the lifetimes of 100 
bulbs selected randomly from the output of the machine will be less than 
110,000 hours. 


The apparatus known as Galton’s qu Е ; 
of Chapter 6. Assume that in passing from one row to the next the change 
X in the abscissa of a ball is a random variable, with the following proba- 
bility law: PLY = 3] = PIX = —H = 3 РХ PIX = —}] = 
1, in which » is an unknown constant. In an experiment performed with a 
quincunx consisting of 100 rows, it was found that 80% of the balls 


i i а i he 

inserted into the apparatus passed through the 21 central openings of t! 

last row (that is e peine with abscissas 0, Fi, 2,555. E10) 
1 ith this result. 


Determine the value of 7 consistent wI 
ars in a group of 7 securities, whose rates 


A mani 
nvests a total of N doll $ 
E nt random variables X;, Xz,- +++ АХ, 
; and variances 01°, Gies On D 


Of return (interest rates) are independe 
respectively, with means й, і, +.» in У 
> a p'2 r * 4 
respectively. If the man invests N; dollars in the jn uc poi 

return in dollars his particular portfolio is a гап om variable R gi 
wre tae E Let the standard deviation a[R] 


by R = NX, + №ХЬ +... + МХ | F 
of R be used as a measure of the risk involved in selecting a given portfolio 


of securities. In particular, let us consider the problem of distributing 


incunx is described in exercise 2.10 


378 EXPECTATION OF A RANDOM VARIABLE cH. 8 


investments of 5500 dollars between two securities, one of which has a um 
of return X,, with mean 6% and standard deviation 1 %, whereas the ot her 
has a rate of return X, with mean 15% and standard deviation 10%. 
(i) If it is desired to hold the risk to a minimum, what amounts N, and 
N, should be invested in the respective securities? What is the mean and 
variance of the return from this portfolio? 


ii) What is the amount of risk that must be taken in order to achieve а 
portfolio whose mean return is equal to 400 dollars? , 

(iii) By means of Chebyshev's inequality, find an interval, symmetric 
about 400 dollars, that, with probability greater than 75%, will contain 
the return R from the portfolio with a mean return E[R] = 400 dollars. 


Would you be justified in assuming that the return R is approximately 
normally distributed ? 


6. THE MEASUREMENT SIGNAL-TO-NOISE RATIO 
OF A RANDOM VARIABLE 


^ question of great importance in science and engineering is the follow- 
ing: under what conditions can an observed value of a random variable x 
be identified with its mean ELX]? We have seen in section 5 that if X is 
the arithmetic mean of a very large number of independent identically 
distributed random variables then for any preassigned distance € an 
Observed value of Y will, with high probability, be within є of ЕХ]. 
In this section we discuss some conditions under which an observed value 
of a random variable may be identified with its mean. 

If X has finite mean ELY] and variance c?*LX], then the condition that 
an observed value of X is, with high probability, within a preassigned 
distance e from its mean may be obtained from Chebyshev’s inequality: 
for any e 0 


(6.1) PIX — EX] < q > 1 SEE 
€ 


From (6.1) one obtains these conclusions: 


(6.2) PIX – EU] < «2 95% ер 45o[X], 


2999  ifez l0o[X]. 

If X may be assumed to be approximately normally distributed, then 
(6.3) PIX – E[xX]|<q= M¢/o[X]) — Ф е/о[Х). 

From (6.3) one obtains these conclusions: 


(6.4) РІХ – ЕХ] <‹4>95% е 1.960[х], 


299% if e> 2.580[Х]. 


SEC. 
С. 6 THE MEASUREMENT SIGNAL-TO-NOISE RATIO 379 


Pi a measure of how close the observed value of Х will be to its mean 

‚ one often uses not the absolute deviation |X — E[X 

relative deviation i ii 

(6.5) EH, | mS 
JERI] E[X]| ^ 


assuming that E[X] Æ 0. 
d Chebyshev's inequality may be reformulated in terms of the relative 
eviation: for any 6 > 0 


(6.6) ‚| X — E[X] =) арыз 1 аа | 
E[X] ô E*[X] 
From (6.6) one obtains these conclusions: 
(6.7) ‚| X — Е[Х] ] ; [X] 
X— EIX] |25| >95% 1024.5 =, 
ЕХ] | 4 7 |E[X]] 
А c[X] 
> 99% ifóz10——. 
m |ELXTI 
Similarly, if X is approximately normally distributed, 
(6.8) | x — ЕХ] ] коя а[Х] 
p| (ŽEH <a] 595% 152 1967 
im | | [Е[Х] 
саа c[X] 
> 99% 652258 ———. 
|ELX]I 
From the foregoing inequalities we obtain this basic conclusion for a 
ariance. 


ran P à 
Tum variable X with nonzero mean and finite v 
1 order that the percentage error of X as an estimate of E[X] may 


With hj, ii " 
t high probability be small, it is sufficient that the ratio 


(6.9) 1801 
c[X] 


be 
large, The quantity in (6.9) is called the measurement signal-to-noise 


Ns 
чыр" of the random variable X. 
Variable 818 must the measurement 
Mean 9 e be in order that its observe 
? By (6.7) and (6.8), various answers to t 


signal-to-noise ratio of a random 
d value X be a good estimate of its 
his question can be obtained. 


atio of a random variable is the reciprocal of the 
ble. (For a definition of the latter, see M. G- 
Griffin, London, 1958, p. 47.) 


* 
Th 
cfficiz measurement signal-to-noise r 
endal a of variation of the random varia 
and A. Stuart, The Advanced Theory of Statistics, 


Co, 


380 EXPECTATION OF A RANDOM VARIABLE cH. 8 


For example, if it is desired that 


X — E[X] | 8 
——— — | € 10%] > 95%, 
(6.10) >| Ex) |= 0% |= 
then the measurement signal-to-noise ratio must satisfy approximately 
A > 45 if Chebyshev's inequality applies, 
a| 
(6.11) EON 
ERII = 20 if the normal approximation applies. 
ох] 


The measurement signal-to-noise ratio of various random variables is 
given in Table 6A. One sees that for most of the random variables 
given the measurement signal-to-noise ratio is proportional to the square 
root of some parameter. For example, suppose the number of particles 


TABLE 6A 


MEASUREMENT SIGNAL-TO-NOISE RATIO OF RANDOM VARIABLES 
OBEYING VARIOUS PROBABILITY LAWS 


| Е[Х]\* 
Probability Law of X ЕХ] eX) "a xj 
Poisson, with parameter & 
as 5 5 
Binomial, with parameters "P. 
n and p пр npü — p) T-P 
ее 1 г 
Geometric, with parameter P - < q 
p P a 
Uniform over the interval a +b bid 
atob 2 120 — ay " (=: 
Normal, with parameters т\* 
mand c m о? E 
Exponential, with parameter 2 n 5 І 
d 2% 
. n 
22, with п degrees of freedom п 2n 2 


б Pe ise n 2n?(n, +m — 2) ny, = 4) 
reedom ———— RA Чен 72) — 
7 —2 һб —2 (n —4) 2m + т 


2) 


if n, > 2 if n, 4 ifm > 4 


SEC. 6 
THE MEASUREMENT SIGNAL-TO-NOISE RATIO 381 


emitt ioacti i 

B ys radioactive source during a certain time interval is being 
m E he number of particles emitted obeys а Poisson probability 
da n some parameter A whose value is unknown. If the true value of å 
"m ae be very large, then the observed number X of emitted particles 

estima i i i i 
3 te of 2, since the measurement signal-to-noise ratio of X 
It is i 
E shown in Chapter 10 that many of the random variables in Table 6A. 
Ко sets: | normally distributed in cases in which their measure- 
signal-to-noise ratio is very large. 


» Mte 6A. The density of an ideal gas. Ап ideal gas can be regarded 
Че eges of n molecules distributed randomly in a volume V. The 
Жошы? the gas in a subvolume г, contained in V, is a random 
and Nis ^ ani byd= Nm/v, in which тт is the mass of one gas molecule 
aE of ae number of molecules in the volume v. Since it is assumed that 
the suby, ы n molecules has an independent probability v/V of being in 
ее ume v, the number N of molecules in v obeys a binomial 
Which ee law with mean E[N] = m/V and variance o?[N] = npq, in 
Ela] = ve have let p = v/V and q = 1 — P- The density then has mean 
tie In speaking of the density of gas in the volume v, the 
eu t usually has in mind the mean density. The question naturally 
of tlie е7 what circumstances is the relative deviation (d — E(dp/ Ela] 
Percent rue density d from the mean density E[d] within a preassigned 

Re age error 6? More specifically, what values must п, n, v, and V 

n order that 


6. = 
(6.12) [| E[d] <ә]>1- 
i E|d] 
B ons 
Mn? and 7 are preassigned positive quantities. By Chebyshev's 
(6.13) [ d — E[d] ed] q 

Pi (= ee Ra, 
C Е[а] = ] z1- aed б?р 

Onsequently, if the quantities zt, т, V and V are such that 
(6.14) 1 — (00) < ôn, 
nol) ^. 


normous size of n (which is of the 
6.14) to be satisfied for n = ô = 105°, 
In this case it makes sense to speak 
e number of molecules in v is not 
small, the fluctuations become 
tion of density, which identifies 


th 
Din pu holds. Because of the € 
Say, as | per cm?), one would expect ( 
of the "s as (v/V) is not too small. 
b ensity of gas in v, even though th 
v. ut fluctuates. However, if v/V is very 
ently pronounced, and the ordinary no 


ım ирәш ojdurs ou) jeu 66" 

ќа [y]g veau onn әц шолу JogIP 104 II! ‘ 6 
Ras ul st Aypiqeqosd əy} J! әзе) 9uo pinous o[dures wopurs v БЕЛЕ 
MOH Ӯ oouvLIeA UMOUY pue ивош uAOUun ue sey X 9|QEHEA Шориві y T9 


SASIDUAXA 


`6 pue / Sidey 
‘Kay ‘aston JO әрШоима *ueuro914 "f `[ 3|nsuoo Кеш 
suonsonb yons ur paysosajur ләрюә1 SUL ssnyeiedde oruodjooo чәрош 
jo попепјелә IY} ш J04 21599 E Къа sones ә10и-03-ү8и815 ju2ulo1nsea 
"вәлә әшо$ 0} ви!р1оээ® ач у ae um səeunsa 
ша1о} о} рәѕп oq ugo $әпүел paasasqo acu X к #4 S9[qerre. 
шорирл puy о} *aunseaur o] says oY ЧО!ЧА\ 0 J9jourereo v UOAIS suo yy 
-ounseour Surye} ur шщ SuruoJjuoo шај9020 2189 эш ӘЛ[08 0} ләәщцу 
10 jsnuoms ay} djou pm 8211511215 Jo әврә[л\опу P puey 1910 əy) ц 
'paurejqo sey oy втер эц) Jo QOUBIYTUSIS 18918172S IY} 1599 1әлйләўш 0) ләрдә 
ur 521513215 mouy p[nous 1әәш8цә рив jSIjuams oY} “риву ouo ayy uo 
Т 69091 ооџәлауш [8211911015 Jo $9209} Шәрош og Jo aBparmoy 
роо v элец ріпоцѕ auo (ones ә5100-01-[80815 jU2uI2JnSUOUI еш v Чим 
o[qelivA шорир1 e JO игәш oq) ә1пѕвәш 0) ‘sr теу) ә510и Jo oouoso1d 3 1 
ш sjeuSIs yeom ounsvour oL '(9 зә}йец2 Jo аЄ ә1Яшъхә oos) ѕәогләр Yons 
ur juosoud suonenjony 1цә11пә snoouejuods 10 210и IY} UI01J sastie qun 
SIL `$әогләр o1uoijooJo YM UIJL, 5)0ӘШӘЈПЅРӘШ JO SULLA эц} xu; 
ugo ouo еш MOY uo уш jua roqur uv sr 2190] 'ojdurexo 10-4 "o]qtssod oq 
jou Кеш si ors ur suoneniis үезиәшиәйхә әле 9190] “IOAAMOH ^on; 
osiou-oj-|eugrs juouro1nseour q8rq v элец ey sjuouiodxo jonpuoo о} эде 
Зшәд ш sor 1snuoros jejuourodxa ayy Jo [DIS PUL "9IqeteA uopue; g 
jo uvour oy} әуешцѕә оз "ApuopeAmbo “10 25100 Jo oouosa1d ot] ur Bugis g 
urejqo 0} Sundwoane sr ay juouroinseour v вәңеш 15109195 e әш Kuy 
‘LY]g әпүел рәләр әц1 Jo 21euinso uv se y onjea 
рәлләѕдо ay} ILINDIM әлош ay} ‘ONLI ASIOU-O}-[BUSIS ou JOYyStYy L '(6:9) 
Aq рәшјәр onea asiou-o}-[eusis oy} saxo auo *ujSua1js ostou 0} Yuans 
]euSis Jo oinseour v sy  "Ápojeinoov [80815 ƏY} 94/9991 0] AQL si ouo 
зәцәд oy} °зиәзәл@ astou $$ә ou] ‘asyou рәцүеә SI Y on|]PA PAP əy) 
pue [х]9 onea poursop ay} uaaariag әзиәләр1р IYL "poArooo1 Alyenqor sr y 
“aamoy "(194192291 отри e ye *Хе$) әл1әзәл о} SuNduisye sı ouo Tu 
180815 v se рәрте8ә1 51 y o[qvtieA шорир1 v Jo [y]g ueu ayy ‘лозу 
suonorunumuoo ш PayBUISLIO „ою asiou-oj-[eusis,, ÁSo[ouruLio] au. 


561 DEO A. MON 


» `ә8те] AIDA Os[v st 
(ССГ) o1 enba st “уд әј] шолу *uorjw) 7 JO onz ostou-oj-[euSis 
уцәшәлп$вәш JY} uai 10] ‘әблеү Аләл sr шорәә1} JO ѕәә18әр Jo Joquinu 
oy jr uonnqugstp z% B YIM s[qerigeA uropui е рив juvjsuoo € yoq Suq 


ESE OILVA 3SION-OL-IVNDIS INIWJNNSYIW AHL 9 ‘oas 


go se om Jo 7 ләшә oy} Sui 
Jd ® шолу ‘ou st uo b ) 

juod [gone ӨНӘП 510) 0] JoMsup ayy, guonnarsr 
voruouto 8.8491 SutKoqo se рәртедәл1 әд jouurs $әгиәорәл азо е a 
ооб ou} ut зшой e se ojejs st А8ләцә juejsuoo seu seg ey yey} ирәш 5103 
sood `5#8 21 Surstiduroo sap red Jo 1әдшпи әц st y чощ ш “шорәәл} 
o sooiSop NE Чил uonnqguysip 0 в Suissossod эдеп , | 
sv oq Jo 7 А8ләцә əy} uayy Uonnquisrp meoruougs s sqq 

Kaiiqeqod asoym uouourouaud шори s аа 
umoys st 31 L Io)dvu jo Нб әӘ|Чшъх 
g pue 30035002 QJOQ oq 528 [vopr ue 


» 


оц} JO OHBI 2100-03-881 juourainsvaur 
jo squouroinsBout juopuodopur paqradar и jo 


"LA. Jo 19p1o 
B sey А əqenea wopurl e 
ӘЗрләле 10 wns ƏY} *spioA uy 
Ао Ul _ [ës] 


[xig ^ ^ yg 7 ("siz 


Aq uaais sones 9S10U-0)-[eUS Is juouro1nseour Әле ufs =" 


uvour adwes əy} pue "y +e зу ут 


X = *5 ums ay} uog Xx 
ә19211^ UIOpUEI € S? painqiystp Ajpeoruapr 59192 11—л uropue1 juopuodopur 


erg "gr 959 о qr UA Jo mer әцү jo uone[nuioj [eoryeuroq;eur 
т әлә 97815 IM .,J9DIOSIp шоу iopio soonpoid YoryM urrueuqoour 
qeonsmneis, OF uo poseq ‘soiskyd jo sme Areurpio ay) fq oii зәліләуш 
оз 2191504 oq 10и Кеш у yey} soimoofuoo лә8шрогцо$ *, usrues10 Surat 
g ШЦ SjUoA9 [пуле pue Ápiopio L194 Əy} ur ajor Suneuruop e ќе 
yorum ‘SMEL [8011511235 oexa Áejdsıp 03 yews оо} yonu *suroje jo 5Чпол8 
пеш К]а!рәлош,, oe әләц yey) ous иеэ ouo 99UIS "s«[ joexo ourooaq 
sorskyd Jo SMe] əy} ор oseo sty) ш Кио 10у *suroje Jo Təqumu әтер Аләл v jo 
uorje1odooo ay} 2A[oAUI urstueg10 ue Yons Jo sosso»o1d juvAo[o1 KjeorSojorq 
əy} 7y} Aressaoau st 11 suistueS1o SUIAT] Jo loj^?uoq əy} Зшиләло8 smer 
әцу urejdxo 0} juarons aq oj Áysruyo pue $о15Ац@ Јо smej ay} 10] ләрло 
ur Jey} uorsnjouoo at svep 1o8urpo1uog UA Jo MEJ ou] шол] еле] Teu] 
moge uq 03 oje19dooo yey} Se[noo[our Jo Jaquinu ot st и әләцл и МПТ 
Jo 19p10 211 ЈО 10119 әлә o[qeqod v шум ojenooeur ore Knsiuioqo 
Teors&ud pue soisKud jo ѕме ou [. “MEL WA. рәрео-оѕ ayy “мет peorskyd Aue 
ur payoadxa aq о} Kovinooeur јо эәлЗәр ayy" * *,, (9T'd ‘oper *sso1q Аузләл 
-UN ә8рифдшегу AT st луи) uawas SUIMOT[OJ əy} ur Ino рәушой 
seq зә8щролцоЅѕ umg ystorsKyd əy L UA 30 мег yL "99 әјішехя «4 


» 'suiSua[oAeA oys Apuorouins Jo Зшлә}еоѕ osneo Kot 
se yonuiseu! Кезшәшиәдхә рәјәәјәр aq Ajjenioe ueo зәшпүол pews ш 
.Suonenjony Aysuap,, ou] 'Surueour 51 soso| “Хизиәр uvaw yy Aysuap 


$ ‘но STISVhIVA WOGNVY V JO NOLLVIOddX3 c8t 


384 EXPECTATION OF A RANDOM VARIABLE cu. 8 


more than 0.1, (ii) more than 10% of the standard deviation of X, (iii) more 


than 10% of the true mean of X, if the true meañ of X is known to be 
greater than 10, 


6.2. Let X, X,..., X, be independent normally distributed random variables 
with known mean 0 and unknown common variance o2. Define 


п s 2 2 
S, =>? + Х +++ X. 


Since E[S,] = 0°, S, might be used as an estimate of o2. How large 
should л be in order to have a measurement signal-to-noise ratio of Sn 
greater than 20? If the measurement signal-to-noise ratio of S, is greater 


than 20, how good is S, as an estimate of с?? 
B 
6.3. Consider a gas composed of molecules (with mass of the order of 1072 
grams and at room temperature) whose velocities obey the Maxwell- 
Boltzmann law (see exercise 1.15). Show that one may assume that all the 
molecules move with the same velocity, which may be taken as either the 
mean velocity, the root mean Square velocity, or the most probable velocity. 


7. CONDITIONAL EXPECTATION. BEST LINEAR 
PREDICTION 


An important tool in the study of the relationships that exist between 
two jointly distributed random variables, Х and Y, is provided by t he 
notion of conditional expectation. In section 11 of Chapter 7 the notion 
of the conditional distribution function FyixC | ж) of the random variable 


Y, given the random variable X; is defined. We now define the conditional 
mean of Y, given X, by 


Г, dF yxy | x) 


00 AY X= a= | | seal a, 


over all y such that 
Pyixylz) > 0 


V Pyx(y | x); 


[an X = 2] dF (zx) 


(12 Ne 4 | аа quitus 


E = 8 
Over all z such that LE | x px (а) 
2х2) > 0 


SEC. 7 CONDITIONAL EXPECTATION 385 


P» Example 7A. Sampling from an urn of random composition. Let a 
random sample of size п be drawn without replacement from an urn 
Containing N balls. Suppose that the number X of white balls in the urn 
is a random variable. Let Y be the number of white balls contained in 
the sample. The conditional distribution of Y, given X, is discrete, with 
Probability mass function for z = 0, 1,..., № and y=0,1,...,% 


given by 
ay (М х 
(=) 
N , 
() 
Since the conditional probability law of Y, given X, is hypergeometric. 


The conditional mean of Y, given X, can be readily obtained from a 
knowledge of the mean of a hypergeometric random variable; 


(73) Prixy|2) = P[Y 2 y| X23] = 


x 
(7.4) ДҮ|Х==п5. 
The mean number of white balls in the sample drawn is then equal to 
C3  gyje X gri x ops = ® Харб) = ЁЛ}. 
i й is Мхт=о ^ N 


Now ELY]/N is the mean proportion of white balls in the urn. Conse- 

quently (7.5) is analogous to the formulas for the mean of a binomial or 
YPergeometric random variable. Note that the probability law of Y is 

hypergeometric if X is hypergeometric and Y is binomial if X is binomial. 

(See theoretical exercise 4.1 of Chapter 4.) 

D> Example 7B. The conditional mean of jointly normal random 


Variables, Two random variables, X, and X are jointly normally 
distributed if they possess a joint probability density function given by 


(2.18). Then 


| 1 X — Ma — (olo) p, — m) А 
(7.6) хых (ә | x) = УЙ. =e «( o, V1 —p 


Consequently, the conditional mean of Xp given Xj, is given by 


с, 

(7.7) ELX, | X, = 2] = т + - pæ — т) = % + Вул 
1 

in which we define the constants % and f; by 


ep 03 =p. 
(7.8) a, = m — z pm, Py P 


386 EXPECTATION OF A RANDOM VARIABLE cH. 8 
Similarly, 


С; С; 
(7.9) EX | Xo = 23] = 99 + В; о = m — = Рт», Вз = = Р 
From (7.7) it is seen that the conditional mean of a random variable Ж, 
given the value x, of a random variable X, with which X» is jointly normally 
distributed, is a linear function of xy. Except in the case in which the two 
random variables, X, and X, are jointly normally distributed, it is 
generally to be expected that ELX, | X, = 21] is a nonlinear function of 
а. 4 
The conditional mean of one random variable, given another random 
variable, represents one possible answer to the problem of prediction. 
Suppose that a prospective father of height ту wishes to predict the height 
of his unborn son. If the height of the son is regarded as a random 
variable X, and the height «, of the father is regarded as an observed value 
of a random variable X, then as the prediction of the son’s height we take 
the conditional mean E[X, | X, = 21]. The justification of this procedure 


is that the conditional mean ЕХ | Xy = ху] may be shown to have the 
property that 


(710) EX, — ED, | X, = ay] 


Жы 


SHUG — sol m [^ [^ te еар а, д дад» 


for any function 8() for which the last written integral exists. In words, 
(7.10) is interpreted to mean that if X, is to be predicted by a function 


&(X1) of the random variable A, then the conditional mean E[X, | X, = 81] 
has the smallest mean Square e 3 


est rror among all possible predictors g(X1)- 
| From (7.7) it is seen that in the case in which the random variables are 
jointly normally distributed the problem of computing the conditional 
mean E[X, | Y, = a тау be reduced to that of computing the constants 
a, and f4, for which one requires a knowledge only of the means, variances; 
and correlation coefficient of X, and АХ». If these moments are not known, 
they must be estimated from observed data. The part of statistics concerned 
% and fiis called regression analysis. 
bability law of the random variables 
but is such that the calculation of the 
s intractable, Suppose, however, that 
Ssumed to be positive), and correlation 
€ prediction problem may be solved by 


— E[X| X, = v fe x Qs o) аху @®» 


conditional mean E[X, | X= a) i 
one knows the means, variances (a: 
coefficient of X, and Xə. Then th 


SEC. 7 CONDITIONAL EXPECTATION 387 
forming the best linear predictor of Xs, given Ху, denoted by E*[X, | Ху = as]. 
The best linear predictor of X, given Xj, is defined as that linear function 
a + bX, of the random variable X}, that minimizes the mean square error 
of prediction Е[(Х, — (а + bX] involved in the use of a + bX, as a 
Predictor of Х,. Now 


A ЕХ, — (a + БХ) = 25[Xs — (a + БХ) 
да 
(7.11) 


-2 El(Xy — (а + БХ) = 28105 — (a + 5X))Xl. 


Solving for the values of a and b, denoted by « and В, at which these 

derivatives are equal to 0, one sees that « and f satisfy the equations 
E[X4] = Е[Х»] 

(7.12) uias. : , 

«Еру + BELX] = EIX]. 


Therefore, E*[X,| =] = < + fs, in which 


Cov [X,, Xd _ AX py 
(7.13) «= ЕХ] – ВЕ. $= EDD = Xj Р(Х, X3). 


Comparing (7.7) and (7.13), one sees that the best linear predictor 
E*UX, | X, = 21] coincides with the best predictor, or conditional mean, 
ED, | X, = ау], in the case in which the random variables X, and X, are 


jointly normally distributed. 


We can readily compute the mean s 
With the use of the best linear predictor. We have 


quare error of prediction achieved 


(7.14) 
EUX, — ЕХ, | = a] = ERX — ЕГ) — BOG — EDGDPI 
= Var [X] + Ё Var [Xi] — 28 Cov LX2, Xi] 


Cov? X, ХУ] 
= Var [X] — Var [x] 


= Var [0] {1 са pO. X,)}. 


From (7.14) one obtains the important conclusion that the closer the 
s is to 1, the smaller the mean square 


Correlation between two random variable 2 on 
error of prediction involved in predicting the value of one of the random 
variables from the value of the other. 


388 EXPECTATION OF А RANDOM VARIABLE cH. 8 


The Phenomenon of **Spurious" Correlation. Given three random 
variables X, V, and W, let X and Y be defined by 


U V 
(7.15) X=U+W, Y=V+W or х= 7° Yu: 
(or in some similar way) as functions of U, V, and W. The reader should 


be careful not to infer the existence of a correlation between U and V from 
the existence of a correlation between Х and Y. 


p> Example 7C. Do storks bring babies? Let W be the number of women 
of child-bearing age in a certain geographical area, U, the number of 
storks in the area, and V, the number of babies born in the area during a 
specified period of time. The random variables Х and Y, defined by 


U V 

(7.16) X= y? T 

then represent, respectively, the number of storks per woman and the 
number of babies born per woman in the area. If the correlation coefficient 
Р(Х, Y) between X and Y is close to 1, does that not prove that storks 
bring babies? Indeed, even if it is proved only that the correlation 
coefficient p(X, Y) is positive, would that not prove that the presence of 
storks in an area has a beneficial influence on the birth rate there? The 


reader interested in a discussion of these delightful questions would be 
well advised to consult J. Neyman, Lectures and Conferences on Mathe- 
matical Statistics and Probabii 


lity, Washington, D.C., 1952, pp. a 


THEORETICAL EXERCISES 


In the following. exercises let Xy X, and Y be jointly distributed random 
variables whose first and second moments are assumed known and whose 
variances are positive. 


7.1. The best linear predictor, denoted b 
is defined as the linear functio 


y E*LY | Xi, Ху), of Y, given X, and X» 
EKY — (a + bX, + bX, 


Na + bX + b,X, which minimizes 
7]. Show that 

E*(Y | X, Xj] = ELY] + BO — EUG) + OG — ELX) 
where 


Ву = Cov Ly AWE + Cov [Y, рар 
Bs = Cov LY, ХЗ ++ Coy LY, xj 


m 


in which we define 


Zn = Var[XJ/A, — X, = Var ГАЈА, =, = Ж = —Cov [Xp XJA 


A — Var LX] Var Ж — P(X, Xj. 


SEC. 7 CONDITIONAL EXPECTATION 389 


7.2. The residual of Y with respect to X; i 
те р o X, and X,, denoted by [Y | Xj, Xo], is 
nl Y | X, Xo) = Y — E*DY | ХХ]. 


Show that [У | Xy, X] is uncorrelated with X, and X. Consequently, 
conclude that the mean square error of prediction, called the residual 


variance of Y, given X, and Xa, is given by 
EDPUY | Xy, ХЫ] = Var [Y] — Var [E*CY | Xs, Xoll. 


Next show that the variance of the predictor is given by 


Var [E*LY | Ху, XJ) = A? Var Li] + Bo? Var DX] + 28,8 Cov Us, Xa] 
= X, Cov? Y, X] + Ys Cov? [Y, X;] 
+ 293, Cov [Y, Xi] Cov LY, Xal. 


The positive quantity R[Y | Xs, Xo], defined by 


2 эй P^ 
RAY | ха = Yat D =Й = у, gt, А, 


lation coefficient between. Y and the random 
tand the meaning of the multiple correlation 


is called the multiple corre 
sidual variance of Y, given X; and 


vector (Xj, X). To unders 
Coefficient, express in terms of it the re: 


2. 


od partial correlation coefficient of X, 
y 
gs, X, | Y] = РОГА LY], le 


73 and X, with respect to Y is defined 


| Y». 
in which ДА; | Y] = X; — E*LY; | Y] for i = 1,2. Show that 
РОХ, Xo) — Р(Х Y)p(X2, Y) 
X. d fl : : 
AX XII Va = Р(Х, Y» = P(X, Y) 


7.4. (Continuation of example 7A). Show that 


ЕІХ] ( Е 22 New, BEB агр, 
N 


(7.17) Var[Y] = n у Wit N-IN 


EXERCISES 

7A. Let Ху, X», X, be jointly dist d random variables with zero means, 
unit variances, and covariances X] = 0.80, Cov [%1, X: 3 = —0.40, 
Cov[X,, X4] = —0.60. Find (i) the best linear predictor of X, given Xo 
Gii) the best linear predictor en Xa (iii) the partial correlation 
between X, and Хз, given Xs, (iv) the best linear predictor of Ху, given 
X, and Ху, (v) the residual variance of X, given X; and Xg, (vi) the residual 
variance of X}, given X». 


390 
7-2. 


7.3. Let X = cos2zU, Y = sin 27 


7.4. Let U, V, and W b 


EXPECTATION OF A RANDOM VARIABLE cH. 8 


i iti i i Y are jointly continuous 
d the conditional mean of Y, given X, if X and í j t | 
np variables with a joint probability density function fev y) 


vanishing except for = > 0, y > 0, and in the case in which x > 0y >0 
given by 


4 = 
(i) g + 3e, 
ii A asa) 
(ii) (1 + 2) е > 
(iii) 9 1+5 +у 
11, 


20 FA Ray] 


U, in which U is uniformly distributed on 
0 to 1. Show that for |x| < 1 


ENY|X=2]=0, Еу х= = уг 


Find the mean square error of prediction achieved by the use of (i) the best 
linear predictor, (ii) the best predictor. 


€ uncorrelated ran 


dom variables with equal variances. 
Let Y 2 U +W, Y — V € №. Sho 


w that 
AX, W) = Y, W) = 1/V2, рж, Y) 2 0.5. 


CHAPTER 9 


Sums of Independent 
Random Variables 


‚ Chapters 9 and 10 are much less elementary in character than the first 
eight chapters of this book. They constitute an introduction to the limit 
theorems of probability theory and to the role of characteristic functions 
probability theory. These chapters seek to provide a careful and 
rigorous derivation of the law of large numbers and the central limit 
theorem, 

In this chapter we treat the problem of finding the probability law ofa 
random variable that arises as the sum of independent random variables. 
A major tool in this study is the characteristic function of a random 
Variable, introduced in section 2. In section 3 it is shown that the probability 
law ofa random variable can be determined from its characteristic function. 
Section 4 discusses some consequences of the basic result that the charac- 
teristic function of a sum of independent random variables is the product 
Of the characteristic functions of the individual random variables. Section 
5 gives the proofs of the inversion formulas stated in section 3. 


1. THE PROBLEM OF ADDITION OF INDEPENDENT 
RANDOM VARIABLES 


A large number of the problems which arise in applications of probability 
theory may be regarded as special cases of the following general problem, 
Which we call the problem of addition of independent random variables; 

"d, either exactly or approximately’, the probability law of a random 
391 


392 SUMS OF INDEPENDENT RANDOM VARIABLES cH. 9 


"ari ises as the sum of n independent random variables Ху, X», . . . , 
церт А as Pius The fundamental role played by 
this problem in probability theory is best described by a бронза e 
an article by Harald Cramér, "Problems in Probability Theory," Annals 
of Mathematical Statistics, Volume 18 (1947), p. 169. 


ing the early development of the theory of probability, the majority of 
pian considered were omae with gambling. The gain of a player in A 
certain game may be regarded as a random variable, and his total gain in ү 
sequence of repetitions of the game is the sum of a number of independen 
variables, each of which represents the gain in a single performance of the game. 
Accordingly, a great amount of work was devoted to the study of the probability 
distributions of such sums. A little later, problems of a similar type appeared in 
connection with the theory of errors of observation, when the total error was 
considered as the sum of a certain number of partial errors due to mutually 
independent causes. At first, only particular cases were considered; but 


gradually general types of problems began to arise, and in the classical work of 
Laplace several results are given concerning the general problem to study the 
distribution of a sum 


Sn =X, X o X, 


of independent variables, when the distri 
problem may be regarded as the vei 
investigations by which the mode: 


butions of the X; are given. This 
ту starting point of a large number of those 
rn Theory of Probability was created. The 
efforts to prove certain statements of Laplace, and to extend his results further 
in various directions, have largely contributed to the introduction of rigorous 
foundations of the subject, and to the development of the analytical methods. 


In this chapter we discuss the methods and notions by which a precise 
formulation and solution is given to the problem of addition of independent 
random variables. To begin with, in this section we discuss the two most 
important ways in which this problem can arise, namely in the analysis of 
sample averages and in the analysis of random walks. 

Sample Averages. We have defined a sample of size n of a random 
variable X as a set of n 


jointly distributed random variables Xy Xoo Xw 
whose individual probabilit 


y laws coincide, for k = 1,2,..., n, with the 
probability law of Y; in particular, the distribution function Fy) of Xx 
coincides with the distribution function Fx() of Х. We have defined the 
sample as a random sample if the random variables X, X,, ..., Xn 216 
independent. 


Given a sample Х,, X»... X, of size n of the random variable X and 
any Borel function gC) of a 


real variable, we define the sample average of 
gC), denoted by M, [e(a)], as the 


) arithmetic mean of the values g( X;), g(X»» 
-  £(X,) of the function at the members of the sample; in symbols, 


ал) Must =~ ¥ goy. 


SEC. 1 THE PROBLEM OF ADDITION 393 


Of special importance аге the sample mean n, defined by 


n 


(1.2) m, = Мі 227 Ye 


n 2, 
nii 
and the sample vari a 
sample variance S,?, defined by 


(L3 52 = Mfr — my] = Ма — My 


=1 


12 - 14 TG 2 
„1 ў отр 2d P= (5 2 Ар 
Hk-l пк=1\ Nk 4 


For a given function gC) the sample average M,[g(x)] is a random 
f the random variables Xj, X» -+> Xx) 


variable, for it is a function o 

The value of M ,[g(x)] will, in general, be different when it is computed on 
the basis of two different samples of size n. The sample average М „[г(%®)], 
like any other random variable, has a mean Е[М „[2(2)]), а уагіапсе 


Var IM, ise), а distribution function Ёзда) а moment-generating 
function Var oco C and, depending on whether it is a continuous or à 
discrete random variable, a probability density function Sargon ora 
Probability mass function ру ta). Our aim in this chapter and the next 
15 to develop techniques for computing these quantities, both exactly and 
approximately, and especially to study their behavior for large sample 
sizes. The reader who goes on to the study of mathematical statistics will 
find that these techniques provide the framework for many of the concepts 


Of statistics. 


To study sample averages M,{¢(a)] with respect to a random sample, it 


ndependent random variables 


k- 
Yy..., Y,, since the random variables Y; = #(Х),--., Ys = g(X,) 
аге independent if the random variables Xj, ---› Ха are. Thus it is seen 
that the study of sample averages has been reduced to the study of sums of 
independent random variables. 
е Random Walk. Consider а ра ў 
t the point 0 on a certain straight line. 
displacements along the straight line in the form of a series of steps, 
denoted by X, Xe 259 in which, for any integer k, X, represents the 
displacement suffered by the particle at the kth step. The size X. of the 
th step is assumed to be a random variable with a known probability law. 
The particle can thus be imagined as executing a random walk along the 


ine, its position (denoted by 5,) after п steps being 
; in symbols, $, = Аз + Xa + 
i ility law of S, 


S К n E 
Uflices to consider the sum У, Y, of i 


rticle that at a certain time is located 
Suppose that it then suffers 


394 SUMS OF INDEPENDENT RANDOM VARIABLES cH. 9 


any interval a to b, the probability P[a < 5, < b] that after n steps the 
particle will lie between a and 5, inclusive. 

The problem of random walks can be generalized to two or more 
dimensions. Suppose that the particle at each stage suffers a displacement 
in an (x, y) plane, and let X, and Y, denote, respectively, the change in the 
x- and y-coordinates of the particle at the kth step. The position of the 
particle after п steps is given by the random 2-tuple (S,, Т„), in which 
$„ = Xit Xs... X, and T, = Y, + Yo+...+ Y, We now 
have the problem of determining the joint probability law of the random 
variables S, and T,,. 

The problem of random walks occurs in many branches of physics, 
especially in its2-dimensional form. The eminent mathematical statistician, 
Karl Pearson, was the first to formulate explicitly the problem of the 
2-dimensional random walk. After Pearson had formulated this problem 
in 1905, the renowned physicist, Lord Rayleigh, pointed out that the 
problem of random walks was formally “the same as that of the com- 
position of n isoperiodic vibrations of unit amplitude and of phases 
distributed at random," which he had considered as early as 1880 (for this 
quotation and a history of the problem of random walks, see p. 87 of 


S. Chandrasekhar, "Stochastic Problems in Physics and Astronomy,” 
Reviews of Modern Physics, Volume 15 (1943), pp. 1-89), Almost all 
poe problems in physics are instances of the problem of random 
walks. 


> Example 1A. A physical example of random walk. Consider the 
amplitude and 


phase of a radar signal that has been reflected bya cloud. 
Each of the water drops in the cloud reflects a signal with a different 
amplitude and phase. The return signal received by the radar system is 
the resultant of all the signals reflected by each of the water drops in the 
cloud; thus one sees that formally the amplitude and phase of the signal 


returned by the cloud to the radar system is the sum of a (large) number о 
(presumably independent) random variables. 


In the study of sums of independent random variables a basic role is 


played by tke notion of the characteristic function of a random variable. 
This notion is introduced in section 2. 


2. THE CHARACTERISTIC FUNCTION OF A 
RANDOM VARIABLE 

It has been pointed Out that the probability law of a random variable X 

may be specified in а variety of ways, To begin with, either its probability 

function P«[] or its distribution function Ех(-) may be stated. Further, 


SEC. 2 THE CHARACTERISTIC FUNCTION OF A RANDOM VARIABLE 395 


if the probability law is known to be continuous or discrete, then it may 
be specified by stating either its probability density function fx) or its 
probability mass function px(). We now describe yet another function, 
denoted by ф «(:) called the characteristic function of the random variable X, 
which has the property that a knowledge of $x(-) serves to specify the 
probability law of the random variable X. Further, we shall see that the 
Characteristic function has properties which render it particularly useful 
for the study of a sum of independent random variables. 

To begin our introduction of the characteristic function, let us note the 
following fact about the probability function Рх[] and the distribution 
function FxX(:) of a random variable Y. Both functions can be regarded as 
the value of the expectation (with respect to the probability law of X) of 
various Borel functions g(). Thus, for every Borel set B of real numbers 


(2.1) Px{B] = Exs = ЕШ). 


in which Ip() is a function of a real variable, called the indicator function 
of the set B, with value /,,(x) at any point x given by 


(2.2) 1,08) =1 — ifa belongs to B, 
=0 if a does not belong to B. 


On the other hand, for every real number y 


(2.3) уб) = Ех) = EU), 
in Which 1,6) is a function of a real variable, defined by 
(2.4) L()-1  ife@sy 

=0 if "ES у. 


We thus see that if one knows the expectation E x[g()] of every bounded 
Borel function gC), with respect to the probability law of the random 
Variable Y, one will know by (2.1) and (2.3) the probability function and 
distribution function of X. Conversely, а knowledge of the probability 
function or of the distribution function of X yields a knowledge of E[g(X)] 
fór every function g(:) for which the expectation exists. Consequently, 
Stating the expectation functional £xt d c 
function whose argument is a func ther equivalent 
Жау of specifying the probability 

" The question arises: is there any O 
ine in addition to those of the form of (2.2) and (2.4) such that a knowledge 


of the expectations of these functions with respect to the probability law 
9f a random variable X would suffice to specify the probability law? We 
NOW show that the complex exponential functions provide such a family. 


396 SUMS OF INDEPENDENT RANDOM VARIABLES CH. 9 


We define the expectation, with respect to a random variable X, of a 
function g(-), which takes values that are complex numbers, by 


Q.5) E[g( X)] = E[Re g(X)] + iE(Im g(X)] 


in which the symbols Re and Im, respectively, are abbreviations of the 
à = а » 
phrases “real part of" and “imaginary part of.” Note that 


g(x) = Re g(x) + i Im g(x). 


It may be shown that under these definitions all the usual properties si 
the operation of taking expectations continue to hold for — € 
functions whose expectations exist. We define E[g(X)] as existing 1 
E(IgCX)I] is finite. If this is the case, it then follows that 


Q.6) IE[gQOJI < Elg), 


or, more explicitly, 


(2.7) (E'IRe g(X)] + E?[Im g(X)}* < Еке (ХУР + (Im g(x}. 


The validity of (2.7) is proved in theoretical exercise 2.2. In words, (2.6) 
states that the modulus of the expectation of a complex-valued function 1$ 
less than or equal to the expectation of the modulus of the function. 

The notions are now at hand to define the characteristic function à x C) 
of a random variable X. We define Ф x C) as a function of a real variable t 
whose value is the expectation of the complex exponential function e 
with respect to the probability law of Y; in symbols, 


© 
$x(u) = Е[е"Х] = [| e`"? qF (ш). 

mi 
The quantity ev for any real numbers x and и is defined by 
(2.9) е! = cos ux + i sin ux, 


in which i is the imaginary unit, defined by i= V—lori?- —1. Since 


le'"*|? = (cos ux)? + (sin ux)? = 1, it follows that, for any random 
variable X, E[Je"*| = ЕП] = т, 


Consequently, the characteristic function 
always exists. 


The characteristic function of a random variable has all the properties 
of the moment-generating function of a random variable. All the moments 


of the random variable X that exist may be obtained from a knowledge of 
the characteristic function by the formula 


(2.8) 


: 1 d* 
(2.10) E[x*] = Pd $x(0). 


u 


To prove (2.10), one must employ the techniques discussed in section 5- 


SEC. 2 THE CHARACTERISTIC FUNCTION OF A RANDOM VARIABLE 397 


More generally, from а knowledge of the characteristic function of a 
random variable one may obtain a knowledge of its distribution function, 
its probability density function (if it exists), its probability mass Junction, 
and many other expectations. These facts are established in section 3, 

The importance of characteristic functions in probability theory derives 
from the fact that they have the following basic property. Consider any 
two random variables Y and Y. If the characteristic functions are approxi- 
mately equal [that is, $ x(u) = y (u) for every real number и], then their 
probability laws are approximately equal over intervals (that is, for any 
finite numbers a and b, P[a < X < b] = P[a < Y < b)) or, equivalently, 
their distribution functions are approximately equal [that is, Ех(а)== Fy(@) 
for all real numbers a]. A precise formulation and proof of this assertion 
is given in Chapter 10. 

Characteristic functions represent the ideal tool for the study of the 
problem of addition of independent random variables, since the sum 
X; + X, of two independent random variables X, and X, has as its 
characteristic function the product of the characteristic functions of X, 


and X,; in symbols, for every real number и 
(2.11) $x x) = 4x, bx) 


if X, and X, are independent. It is natural to inquire whether there is 
some other function that enjoys properties similar to those of the charac- 
teristic function. The answer appears to be in the negative. In his paper 
“An essential property of the Fourier transforms of distribution functions,” 
Proceedings of the American Mathematical Society, Vol. 3 (1952), pp. 508- 
510, E. Lukacs has proved the following theorem. Let K(x, и) be a complex 
valued function of two real variables = and и, which is a bounded Borel 


function of ж. Define for any random variable X 
фх\(и) = ЕІК(Х, и). 


In order that the function ф (i) satisfy (2.1 1) and the uniqueness condition 


(2.12) dy (u) = dx (1) for all v 
it is necessary and sufficient that K(x, и) h 


K(x, u) = gium, 


if and only if Fy (2) = Fx. (2) for all x, 


ave the form 


in which A(z) is a suitable real valued function. 


> Example 2A. If X is NỌ, 1), then its characteristic function ф (и) is 


given by 
(2.13) dx) = е. 


398 SUMS OF INDEPENDENT RANDOM VARIABLES CH. 9 


To prove (2.13), we make use of the Taylor series expansion of the 
exponential function: 


1 ^ œ (iur) eMe? dy 


i fe ae 
214 Ф0 = —— f Me ds = 


LG i P ane H? d 


= n! V 27 Ја 


T= (a et 
m=0 (2m)! 2"m! „0 m! 


2m /—-=п=о n! 


5“ 
The interchange of the order of summation and integration in (2.14) may 


be justified by the fact that the infinite series is dominated by the integrable 
function exp (|uz| — 222). <q 


P» Example 2B. If Y is N(m, o), then its characteristic function Ф х(и) 
is given by 
(2.15) $x(u) = exp (imu — 102и"). 


To prove (2.15), define Y —(X-— m). Then Y is N(0, 1), and 
$y(u) =e, Since X may be written as a linear combination, 
X — cY +m, the validity of (2.15) follows from the general formula 


(2.16) ox(u) = еф, (а) — if X —aY + p. 4 


p> Example 2C. If Y is Poisson distributed with mean E[X] = 2, then 
its characteristic function $ x(u) is given by 


(2.17) pxl) = е0), 
To prove (2.17), we write 


2o hd ye 
2.18 (u) = Puky (Кү = fuh a 
( ) $xG) 2,е Px(k) ae k! Ы 
о (Agiuy: " 
= ex 27 en i PPS « 


> Example 2D. Consider a random variable X with a probability density 
function, for some positive constant a, 


Q.19) ful) =Fe, о ео, 
which is called Laplace’s distribution. The characteristic function $x(u) is 
given by 
a? 
(2.20) Фх(и) = dad] 


SEC. 2 THE CHARACTERISTIC FUNCTION OF A RANDOM VARIABLE 399 


To prove (2.20), we note that since f(x) is an even function of = we may 
write 


(2.21) фу(и) = af cos ux f y(x) dr = af e-™ cos ux dz 
0 0 
e-""(u sin ux — a cos ux) |? а? 
= e+e p а +10 ч 


THEORETICAL EXERCISES 


2.1. Cumulants and the log-characteristic function. The logarithm (to the base e) 
of the characteristic function of a random variable X is often easy to 
differentiate. Its nth derivative may be used to form the nth cumulant of 
X, written K,[X], which is defined M 


(2.22) K,[X] = ~ log gx! eor 


т du” D 


If the nth absolute moment E[| X|"] exists, then both ¢x(-) and log ¢x(-) 
are differentiable л times and may be expanded in terms of their first т 
derivatives; in particular, 


(2.23) logéx() = KLX Min) + K[X] ш 


ш + 


es KX) + Ryu), 


in which the remainder А,„(и) is such that |u|" R,(u) tends to 0 as |u| tends 
to 0. From a knowledge of the cumulants of a probability law one may 
obtain a knowledge both of its moments and its central moments. Show by 
evaluating the derivatives at £ = 0 of e*, in which K(r) = log ¢ y(t), that 
E[X] = K, 
E[X?] = К, + Kj? 
E[X?] = Ky + 3K,K, + Kj 
ELX*] = К, + AK,K, + 3K? + 3К„К,? + Kj! 
Show, by evaluating the derivatives of eF»(0, in which K,,(t) = log ¢x(t) — 
itm and т = E[X], that 

E[(X — т)?] = 
(2.25) Е(Х — т)?] = Ks 

E[(X — m)‘] = K; + 3K}. 


(2.24) 


2.2. The square root of sum of squares inequality. Prove that (2.7) holds by 
showing that for any 2 random variables, X and Y, 


(2.26) МЕХ] + ELY] < E[V X? + Ү?]. 


Hint: Show, and use the fact, that Vx? + y? — Ул? + уд > [(ш — %)%o 
+ (y — удуй] VX? + yo? for real x, y, o, yo with voyo # 0. 


400 SUMS OF INDEPENDENT RANDOM VARIABLES cH. 9 
EXERCISE 


2.1. Compute the characteristic function of a random variable X that has G 
its probability law (i) the binomial distribution with mean 3 and standar 
deviation 2, (ii) the Poisson distribution with mean 3, Gii) the — 
distribution with parameter p = 4, (iv) the normal distribution with mean 


and standard deviation 2, (v) the gamma distribution with parameters r — 2 
and 2 = 3. 


3. THE CHARACTERISTIC FUNCTION OF A RANDOM 
VARIABLE SPECIFIES ITS PROBABILITY LAW 


variable it suffices to specify its characteristic function. 
We first prove a theorem 


an explicit formula for E[g( 


bounded Borel fi unction of a real variable that 
5 a limit from the right g(z + 0) and a limit from 


(3.1) g*(x) = g(x + 0) + g(x — 0) 


be the arithmetic mean of these limits, 


Assume further that g(x) is 
absolutely integrable; that is, 


G3) Го, 


Define y(-) as the Fourie 
real number u 


(3.3) 


r integral (or transform) of g(-); that is, for every 


qe „ 
y(u) = In /_„ё E) ds. 


Then, for any random variable X the ex 


i , es pectation E[g*( y)] may be expressed 
in terms of the characteristi 


с function 4 «(.); 


= U 
(3.4) Elg*(X)] -f ,E'G) аЕу(а) = lim a = М) у(и)ф x(u) du. 


corem is given in section 5. In this 


U— co 


The proof of this important th 
section we discuss its consequences 


SEC. 3 THE CHARACTERISTIC FUNCTION—ITS PROBABILITY LAW 401 


If the product у(и)ф x(u) is absolutely integrable, that is, 


(3.5) le ly@)bx(u)| du < со, 


then (3.4) may be written 
G.9) вно) = |7 69660 du. 


Without imposing the condition (3.5), it is incorrect to write (3.6). Indeed, 
in order even to write (3.6) the integral on the right-hand side of (3.6) 
must exist; this is equivalent to (3.5) being true. 

We next take for g(-) a function defined as follows, for some finite 
numbers a and b (with a < b): 


(3.7) g(x) —1 ifa<2<b 
=} ifz—aorz-—b 
0 


ifs < aor s >b. 


The function g(-) defined by (3.7) fulfills the hypotheses of theorem 3A; 
it is bounded, absolutely integrable, and possesses right-hand and left-hand 
limits at any point =. Further, for every =, g*(x) = g(x). Now, if a and b 
are points at which the distribution function /'y(-) is continuous, then 


(3.8) Года = Felt) = 0, 


Further, 

1 ei" — g-iva 
(3.9) y(u) = aa a 
Consequently, with this choice of function gC), theorem ЗА yields an 
expression for the distribution function of a random variable in terms of its 


characteristic function. 


TueoreM 3B. If a and b, where a < b, are finite real numbers at which 
the distribution function Fx() is continuous, then 


. 1 U ( E gi — giua 4 E T 
(3.10) Fx(b) — Fx@ e p U эне + x . 

Equation (3.10) constitutes an inversion formula, whereby, with a 
knowledge of the characteristic function ¢x(-), a knowledge of the 
distribution function Fx(-) may be obtained. 


402 SUMS OF INDEPENDENT RANDOM VARIABLES cH. 9 


An explicit inversion formula for Fy(x) in terms of Фх() may = 
written in various ways. Since lim F(a) = 0, we determine from (3.10) 


a+ — 0 


that at any point x where F,(-) is continuous 


| | 1 Г Jul\ е — gius 
(3.11) Fx(x)- lim lim — [ —— — 0 du. 


a—— 0 Uo 2T J—U U 
The limit is taken over the set of points a, which are continuity points of 
Fx. H t H H B 
A more useful inversion formula, the proof of which is given in section 5, 
is the following: at any point z, where Fx() is continuous, 


1 Oy іи К 
(3.12) F(x) -i _ Al m [e : $x0, 


T 


The integral is an improper Riemann integral, defined as 


lim [ Tn iea] legal] du. 
0 


u 


U-- o 


Equations (3.11) and (3.12) lead immediately to the uniqueness theorem, 
which states that there is a one-to-one correspondence between distribution 
functions and characteristic functions; two characteristic functions that 
are equal at all points (or equal at all except a countable number of points) 
are the characteristic functions of the same distribution function, and two 
distribution functions that are equal at all except a countable number of 
points give rise to the same characteristic function. 

We may express the probability mass function рх() of the random 
variable X in terms of its characteristic function; for any real number 2 


(3.13) Px(x) = P[X = z] = Ёх(х + 0) — F(z — 0) 
U 
= lim — —iur | Я 
n 20 E: $ x(u) du. 
The proof of (3.13) is given in section 5. 
It is possible to give a criterio 


a random variable Х has an abs 
characteristic function $ 


n in terms of characteristic functions that 


olutely continuous probability law.* Zf the 
xC) is absolutely integrable, that is, 


(3.14) К ІФ xu) du < со, 


SEC. 3 THE CHARACTERISTIC FUNCTION—ITS PROBABILITY LAW 403 


then the random variable X obeys the absolutely continuous probability law 
specified by the probability density function fx(-) for any real number = 
given by 


(3.15) fx) = = [em x(u) du. 


One expresses (3.15) in words by saying that fx(+) is the Fourier transform, 
or Fourier integral, of C). 
The proof of (3.15) follows immediately from the fact that at any 


continuity points « and a of Fx() 


© giur — р—іча 


(3.16) Fx(x) — Fx(a) = x i aa Sa d x(u) du. 


© 


Equation (3.16) follows from (3.6) in the same way that (3.10) followed 

from (3.4). It may be proved from (3.16) that (i) Fx() is continuous at 

every point x, (ii) /у(®) = (d/dz)F x(x) exists at every real number 2 and is 
" b 

given by (3.15), (iii) for any numbers a and b, Fx(b) — Fx(@) =| fx) dx. 


From these facts it follows that Fy() is specified by f. .() and that f. (2) is 


given by (3.15). | 
The inversion formula (3.15) provides а powerful method of calculating 


Fourier transforms and characteristic functions. Thus, for example, from 
à knowledge that 


(3.17) (zee) АИС dx, 


where /(:) is defined by 


(3.18) fG)-1-—h| forle <1 
=/0 otherwise, 


it follows by (3.15) that 


o 1 уіп (x/2)\* 
(3.19) = (E) dx = f(u). 
Similarly, from 
1 = wal —izi 
Gw pacti 


it follows that 
1 


-lul ° hal dx. 
(3.21) e = "- zT a А 


404 SUMS OF INDEPENDENT RANDOM VARIABLES cH. 9 


We note finally the following important formulas m E E 
independent random variables, apri A ИУ а 
istic functions. 1 2 
picem Kern Ke E m distribution functions F x, () and F; x 
EN respective characteristic functions $ x, O and $x,O. It may be po". 
(see section 9 of Chapter 7) that the distribution function of the sum X + 

for any real number z is given by 


(3.22) Fy ye) = | Fy — 2) агу (а). 


On the other hand, it is clear that the characteristic function of the sum 
for any real number u is given by 


(3.23) $x, +00) = Фх,(@Фх (и), 
since, by independence of X, and X,, [е +х, 
distribution function Fy, x), given by (3.22), 
of the distribution functions Fy) 
Fax, Fy, * Еу, 


2] = Efe] E[eiX's], The 
is said to bethe convolution 
and Fy(); in symbols, one writes 


EXERCISES 
3.1. Verify (3.17), (3.19), (3.20), and (3.21). 


32. Prove that if f,(-) and ЉС) are 
sponding characteristic functi 
then 


(3.24) | т fy зр) ds = 2. || 7 eun Gg 0) du. 


Use (3.15), (3.17), and (3.24) to prove that 


(3.25) xl. шш, -[` SY — а) f (a) dx. 


Evaluate the integral on the right-hand side of (3.25), 


3.4. Let X be uniformly distributed 
Show directly that the proba! 
number y is given by 


probability density functions, whose corre- 
ons $,() and ¢,(-) are absolutely integrable, 


3.3. 


over the interval 0 to =. Let Y = A cos a 
bility density function of Y for any real 


(3.26) fr) = 


1 
f «A 
mE 3. or || 


=0 otherwise, 


SEC. 4 SOLUTION OF THE PROBLEM OF ADDITION 405 


The characteristic function of Y may be written 


(3.27) dis) => Гоч 050 dy = J Au), 
0 


in which J(-) is the Bessel function of order 0, defined for our purposes by 
the integral in (3.27). Is it true or false that 


(3.28) i | 


oo 


if |y| < А 


eiw (Au) du = = 
R тУА? — у? 


—o 


=0 otherwise? 


3.5. The image interference distribution. The amplitude a of a signal received 
at a distance from a transmitter may fluctuate because the signal is both 
directly received and reflected (reflected either from the ionosphere or the 
ocean floor, depending on whether it is being transmitted through the air or 
the ocean). Assume that the amplitude ofthe direct signal is a constant а, 
and the amplitude of the reflected signal is a constant a, but that the phase 
difference 0 between the two signals changes randomly and is uniformly 
distributed over the interval 0 to т. The amplitude a of the received signal 
is then given by a? = aj? + aj? + 2a,a, cos б. Assuming these facts, show 


that the characteristic function of a? is given by 

(3.29) dau) = gina, +37. (2аази). 

Use this result and ће preceding exercise to deduce the probability density 
function of a®. 


4. SOLUTION OF THE PROBLEM OF THE ADDITION 
OF INDEPENDENT RANDOM VARIABLES BY THE 
METHOD OF CHARACTERISTIC FUNCTIONS 


By the use of characteristic functions, we may give a solution to the 
problem of addition of independent random variables. Let X, Xp, ..., Xn 
be n independent random variables, with respective characteristic functions 
$x().... dx. Let S, = + Aa Te +X, be their sum. To 
know the probability law of S,, it suffices to know its characteristic 
function $4 (). However, it is immediate, from the properties of 
independent random variables, that for every real number u 


(4.1) s,(u) == $y 00 ape bx, 0) 

or, equivalently, Eet Att] = Efe] «+ - Efe*»]. Thus, in terms 
of characteristic functions, the problem of addition of independent random 
variables is given by (4.1) a simple and concise solution, which may also 
be stated in words: the probability law of a sum of independent random 
variables has as its characteristic function the product of the characteristic 
functions of the individual random variables. 


406 SUMS OF INDEPENDENT RANDOM VARIABLES CH. 9 


In this section we consider certain cases in which (4.1) leads to an exact 
evaluation of the probability law of S,. In Chapter 10 we show how (4.1) 
may be used to give a general approximate evaluation of the probability 
— various ways, given the characteristic function 9s.) of the 
sum S,, in which one can deduce from it the probability law of Sus . 

It may happen that ¢, (-) will coincide with the characteristic function 
of a known probability law. For example, for each k = 1,2, Pu d 
suppose that X, is normally distributed with mean m, and variance о>. 
Then, $y,(u) = exp (ium, — 312012), and, by (4.1), 


Ps u) = exp [um + -++ + m,) — ino, + +++ + oy). 
We recognize 9s, () as the characteristic function of the normal distribution 
with mean my +... + т, and variance a? +... + 0,2. Therefore, the 
sum S, is normally distributed with mean m, +... + m, and variance 
о? +... + o By using arguments of this type, we have the following 
theorem. 


THEOREM 4А. Let 5. = X, +.. 
random variables. 


(i) If, fork = 1,.. 
oe +... + e n. 
(ii) If for = ПРИЗЕР Я 


and p, then S, is binomia 
and p. 


Gii) If, fork =1,... n, X, is Poisson distributed with parameter 2, 
then S, is Poisson distributed with parameter 2, +... + 4,. 


(iv) If, fork = аьей, X, is 7? distributed with N, degrees of freedom, 
then S, is 7? distributed with Ny +... + N, degrees of freedom. 


(v) If, fork =1,... n, X, is Cauchy distributed with parameters a, 


and b,, then S, is Cauchy distributed With parameters a+... 4- a, and 
b+... +5 


-+ X, be the sum of independent 


<>, Xp is Мт, 0,2), then S,is N(m +... + Mtns 


X, is binomial distributed with parameters Ny 
l distributed with parameters Ni +... +N, 


n* 


One may be able to invert the characteristic function of S, to obtain its 
distribution function or probability density function. In particular, if 


$s,() is absolutely integrable, then 5, has a probability density function 
for any real number x given by 


Ll qe. 
(4.2) fs) = n f Кш s (0) du. 
In order to evaluate the infini 


te integral in (4.2), one will generally have to 
use the theory of complex in 


tegration and the calculus of residues. 


SEC. 4 SOLUTION OF THE PROBLEM OF ADDITION 407 


Even if one is unable to invert the characteristic function to obtain the 
probability law of 5, in closed form, the characteristic function can still be 
used to obtain the moments and cumulants of S,. Indeed, cumulants 
assume their real importance from the study of the sums of independent 
random variables because they are additive over the summands. More 
precisely, if Xy, Х,..., X, are independent random variables whose rth 
cumulants exist, then the rth cumulant of the sum exists and is equal to the 
sum of the rth cumulants of the individual random variables. In symbols, 


(4.3) KIX, tee Ж = KAM o + KG: 


Equation (4.3) follows immediately from the fact that the rth cumulant is 
(up toa constant) the rth derivative at 0 of the logarithm of the characteristic 
function and the log-characteristic function is additive over independent 
summands, since the characteristic function is multiplicative. 

The moments and central moments of a random variable may be 
expressed in terms of its cumulants. In particular, the first cumulant and 
the mean, the second cumulant and the variance, and the third cumulant 
and the third central moment, respectively, are equal. Consequently, the 
means, variances, and third central moments are additive over independent 


summands; more precisely, 
ЕХ, + + X) ЕХ] ++ E[X,] 
(4.4) Vài [X, 4-55 4 Ж] = Var D] 4- e Var LX] 
шх ++ = ua] + АА) 


where, for апу random variable XY, we define u;[X] = E(X — Е[Х])8]; 


(4.4) may, of course, also be proved directly. 


EXERCISES 


4.1. Prove theorem 4A. | 7 
4.2. Find the probability laws corresponding to the following characteristic 
functions: (i) e^", Gi) ew ^l, 010) ete - D, (iv) (1 — 2iu)?. 
i dent random variables, each 
43. Let X,, X, ..., X, bea sequence of indepen 1 
uniformly distributed over the interval O to 1. Let S, = X, 4- Xy +... + 
Х,. Show that for any real number y, such that O<y<n+l, 


fs, OU -f /5„®) dv; 
y-1 


hence prove by mathematical induction that 
1 [т] r "TE 05 
5 (2) =- =e —J) ifO <а <и. 
fs = о | j 


=0 Кх <0orv >n. 


408 SUMS OF INDEPENDENT RANDOM VARIABLES CH. 9 


i dent random variables, cach 
X»... , Xn be a sequence of indepen а 
i Teen distributed with mean 0 and variance 1. Let S, = 2 T х? + 
+ Х,2. Show that for any real number y and integer n = 1,2,... 
Sar ae 


fs, 00 = [fts — 2) fe de. 
0 


Prove that fs,(y) = $e-*£ for y > 0; hence deduce that S, has a z? 
5,02 2 
distribution with n degrees of freedom. 
45. Let X, Xo,...,X_, be independent random variables, each normally 


п 
distributed with mean т and variance 1. Let § = > x 


j=1 
(i) Find the cumulants of 5, 


Gi) Let T = aY, for suitable constants a and v, in which Y,isa random 
variable obeying a 7? distribution With v degrees of freedom. Determine 


a and v so that 5 and T have the same means and variances. Hint: Show 
that each X? has the characteristic function 


1 D. І 
Фху(и) тее -;т ( =т= | 


5. PROOFS OF THE INVERSION FORMULAS 
FOR CHARACTERISTIC FUNCTIONS 


In order to stud 
the following basic f; 


limiting operations тау be interchan 
These facts are Stated here without 


ing with the conditions under which, given 


а convergent sequence of functions g,(-), the limit of expectations is equal 
to the expectation of the limit. 


THEOREM 5A, Let EnC) and g(-) be Borel functions of a real variable a: 
such that at each real number 2 


(5.1) lim E, (X) = g(x). 


If a Borel function G(-) exists such that 


(5.2) lg,G)) < G(x) for all real x and integers n 


and if E[G(X)] =f" G(x) dF x(a) is finite, then 


(5.3) lim Flg,(X)] = E[ lim ECO] = Elg(x)]. 


SEC. 5 PROOFS OF THE INVERSION FORMULAS 409 


In particular, it may happen that (5.2) will hold with G(«) equal for all x 
to a finite constant C. Since E[C] — C is finite, it follows that (5.3) will 
hold. Since this is a case frequently encountered, we introduce a special 
terminology for it: the sequence of functions g,() is said to converge 
boundedly to g(-) if (5.1) holds and if there exists a finite constant C such that 
(5.4) |g,()| С for all real x and integers n. 


From theorem 5A it follows that (5.3) will hold for a sequence of functions 
converging boundedly. This assertion is known as the Lebesgue bounded 
convergence theorem. Theorem 5A is known as the Lebesgue dominated 


convergence theorem. 
Theorem 5A may be extended to the case in which there is a function of 


two real variables g(x, u) instead of a sequence of functions g, (2). 
THEOREM 5B. Let g(x, и) be a Borel function of two variables such that 
at all real numbers x and и 


(5.5) lim g(x, u’) = g(x, u). 


ш.и 
Note that (5.5) says that g(x, v) is continuous as a function of u at each v. 
If a Borel function G(x) exists such that 


(5.6) lg, Ш) < Ge) for all real v and u 


and if E[G(X)] is finite, then for any real number и 
(5.7) lim E[g(X, и'] = Elg(% 10). 
ut 

Note that (5.7) says that Elg(X, 1)] is continuous as а function of u. 

We next consider the problem of differentiating and integrating a 
function of the form of E[g(X, и)]. 
Borel function of two variables such that 
Qu with respect to u exists at all real 
tion G(-) exists such that 


THEOREM 5С. Let g(x, u) be a 
the partial derivative [08(2, :)]/ 
numbers x and u. If a Borel func 


дв, w) < б@у forall v and u 
дш | 


(5.8) 


and if E[G(X)] is finite, then for any real number u 


д p 
(5.9) = E[g(X, 0) = DE, 80, ә] : 


As one consequence of theorem 5C, we may deduce (2.10). 


410 SUMS OF INDEPENDENT RANDOM VARIABLES CH. 9 


THEOREM 5D. Let g(x, u) be a Borel function of two variables such that 
(5.5) will hold. If a Borel function G(-) exists such that 


(5.10) [` lg, u)| du < G(x) for all x 
and if E[G( X)] is finite, then 


Г E[e( X, u)] du = z| |” gCX, u) du $ 


I du” dF x(x) g(a, и) =|" AF x(a) |” du g(x, u). 


It should be noted that the inte 
variable u may be inter 
holds. However, the a 
if we interpret the inte 


(5.11) 


grals in (5.11) involving integration in the 
preted as Riemann integrals if we assume that (5.5) 
ssertion (5.11) is valid even without assuming (5.5) 
grals in и as Lebesgue integrals. 

Finally, we give a theorem, anal 
integrals over the real line, 


THEOREM 5E. Let h,(-) and hC) be Borel functions of a real variable 
such that at each real number u 


(5.12) 


ogous to theorem 5A, for Lebesgue 


lim А, (и) = h(u). 
If a function H(u) exists such that 


(5.13) lh, (w)| < Hu) 


and if i Нш) du is finite, then 


(5.14) lim ү" h,(u) du =|" Аи) du. 


Theorem 5E, like theorem SA, isa special case of a general result of the 
theory of abstract Lebesgue integrals, called the Lebesgue dominated 
convergence theorem, 


for all real u and integers n 


We next discuss the proofs of the inversion formulas for characteristic 
functions. In writing 


inctions Ош the proofs, we omit the subscript X on the 
distribution function Fx() and the characteristi 
We first prove (3.13). We note that 


I (8 1 f" = 
2U f ye u) du = 30 и Р «| f" ке. ar | 


————————————————————————— 


SEC. 5 PROOFS OF THE INVERSION FORMULAS 411 


in which the interchange of the order of i ion is justi 
integration is justified by theo 
5D. Now define the functions : , = 


t qq. i - 

50 0) = 35 fec du = MEUS ifyAx 
= 1 ify = =. 
50) =0 у= 
zl ify = 2. 


Clearly, at each y, g(y, U) converges boundedly to g(y) as U tends to co. 
Therefore, by theorem 5A, 


. 1 U © 
lm = ia, =й & ? 
lim 55 [е du) du = lim il „5% U) dF(y) 
-Í g(y) dF(y) = Fy + 0) — F(y — 0). 
-o 
We next prove (3.12). It may be verified that 


Im [е-ї"*ф(и)] = Elsin u(X — x)] 


for any real numbers и and z. Consequently, for any U > 0 
2 fg iur © 2 ("si 229 
(5.15) ET Im [eA] 4, =| are)? | Buy — a) yy 
T JO u -o T J0 u 


in which the interchange of integrals in (5.15) is justified by theorem 5D. 


Now it may be proved that 


р 

(5.16) fin 2 [ T du =1 itt0 
О-о T 40 

=0 ift -0 

--—l if t — 0, 


ded for all U and t. 


in which the convergence is boun 
hed as follows. Define 


A proof of (5.16) may be sketcl 
sin ut 


G(a) =f% pee du. 


Verify that the improper integral defining G(a) converges uniformly for 


а > 0 and that this implies that 


| sin ut sinut y = lim G@. 
0 u а» 


412 SUMS OF INDEPENDENT RANDOM VARIABLES cH. 9 


Now 


— cos ut du = 
(5.17) [ € 


a 

J œ+’? 

in which, for each a the integral in (5.17) converges uniformly for all f. 
Verify that this implies that С(а) = tan~ (t/a), which, as a tends to 0, 


tends to 7/2 or to —7/2, depending on whether t > 0 or t <0. The proof 
of (5.16) is complete. 


Now define 
ga) = —1 ify <x 
=0 ify = 2 
= 1 ify > x. 


By (5.16), it follows that the inte 
side of (5.15) tends to 
have proved that 


2 (° Im (е-и) 
з=. 


T 


grand of the integral on the right-hand 
500) boundedly as U tends to co. Consequently, we 


du =|" &(y) dF(y) = 1 — 2F(2). 


The proof of (3.12) is complete. 
We next prove (3.4). We have 


(5.18) | i E w) y(u)h(u) du 


=|" аә | du es(i — 2) 1 Г e-'"e(y) dy 
-a Јр [7/739 E 


=|" аә] жепк — yy 
in which we define the function K(-) for апу real number z by 


(5.19) К@) = x (sey. - [^ (1 — v) cos vz; 


(5.18) follows from the fact that 


1 | 5 [иј D f 
RI їн(ж— уу) e = dio U(z — 
27 аре? ( 0) ^ 2т [^ PITE SS Ion 


Tk 
= „© — 0) cos vU(x — y) dv. 


SEC. 5 PROOFS OF THE INVERSION FORMULAS 413 


To conclude the proof of (3.4), it suffices to show that 
(5.20) go) =| trs uktve — 9) 
-o 


converges boundedly to g*() as U tends to со. We now show that this 
holds, using the facts that K(-) is even, nonnegative, and integrates to 1; 
in symbols, for any real number и 


(5.21) K(—u)-K(u  K(u) > 0, i K(u) du = 1. 


In other words, K(-) is a probability density function symmetric about 0. 
In (5.20) make the change of variable t = y — x. Since K() is even, it 

follows that 

(5.22) gue) = | g(x + D)UK(UT) dt. 


By making the change of variable // = —t in (5.22) and again using the 


fact that K(-) is even, we determine that 
(5.23) gue) -Í g(x — DQUK(UIT) dt. 


Consequently, by adding (5.22) and (5.23) and then dividing by 2, we show 
that 

(5.24) МО, -Í diUK(Ut) 
2 — g*(x). From (5.24) it follows that 


getotse—) 
2 


Define h(t) = [g(a + 1) + g(s — D 
(5.25) gue) —&*@® = [ 7 шикй). 


Now let C be a constant such that 2]g(y)| < € for any real number y. 


Then, for any positive number d and for all U and х 
: UK(Ut) dt 
626) Igu e*t) аро „000 
| UK(Ut) dt < зир | + С | Nc 
Iud tsa | > Ud 


nds to оо. Next, by the definition 


+ sup [А00] 

Iud 
For d fixed K(s) ds tends to 0 as Ute 
o 0 as d tends to 0. Consequently, by 


letting first U tend to infinity and then d tend to 0 in nm it n As 
u(x) tends boundedly to g*(2) as U tends to 00. The proo i 
Complete. 


Jl Ud 
of A(t) and g*(r), sup 12001 tends t 
lt] <d 


CHAPTER 10 


Sequences 
of Random Variables 


The basic concepts of probability theory, such as the probability ofa 
random event (or the mean of a random variable), have been given intuitive 
meanings as approximately representing certain averages computed from а 
large sample of independent observed values of the event (or of the random 
variable). In this chapter we treat the problem of giving an exact mathe- 
matical meaning to the word "approximately" as it is employed in the 
foregoing sentence. At the same time, our discussion leads to an answer to 
the question of what constitutes an approximate solution to the problem of 
finding the probability law of the sum of random variables. A basic role in 


this study is played by the notion of the convergence of a sequence of random 
variables. 


1. MODES OF CONVERGENCE OF A SEQUENCE 
OF RANDOM VARIABLES 


Consider a sequence of jointly distributed random variables Zi, Zn: 
Z,, defined on the same probability space S on which a probability function 
P[] has been defined. Let Z be another random variable defined on the 
same probability Space. The notion of the convergence of the sequence of 
random variables Z, to the random variable Z can be defined in several 
ways. 

We consider first the notion of convergence with probability one. We say 

414 


———Á—— ———L——— 


SEC. 1 MODES OF CONVERGENCE 415 
that Z, converges to Z with probability one if P[lim 2, = Z] = 1 or, in 


" n= 
words, if for almost all members s of the probability space S on which the 
random variables are defined lim Z,(s) = Z(s). To prove that a sequence 


n=» oe 
of random variables Z, converges with probability one is often technically 
a diflicult problem. Consequently, two other types of convergence of 
random variables, called, respectively, convergence in mean square and 
convergence in probability, have been introduced in probability theory. 
These modes of convergence are simpler to deal with than convergence 


with probability one and at the same time are conceptually similar to it. 
is said to converge in mean square to the 


Z,-—Ziflim E[(Z,, — ZY] = 0 or, in 
n0 


п 0 
words, if the mean square difference between 2, and Z tends to 0. 
The sequence 21, 25, - · - ,Z,, is said to converge in probability to the 


random variable Z, denoted plim Z, = Z if for every positive number є 
п-» oO 


The sequence Zi, Ze, -+ s Zn 
random variable Z, denoted l.i.m. 


(1.1) lim P[IZ, — Zl > 4 = 0. 
п-+ © 

Equation (1.1) may be expressed in words: for any fixed difference є the 

Probability of the event that Z, and Z differ by more than є becomes 

arbitrarily close to 0 as л tends to infinity. 

Convergence in probability derives its importance from the fact that, like 
convergence with probability one, no moments need exist before it can be 
considered, as is the case with convergence in mean square. It is immediate 
that if convergence in mean square holds then so does convergence in 
probability; one need only consider the following form of Chebyshev's 


inequality: for any e > 0 
1 
(12) PZ, - Zi» d &a 82,2% 


The relation that exists between convergence with probability one and 
Convergence in probability is best understood by considering the following 
Characterization of convergence with probability one, which ме state 
Without proof. Let Zy 2» -+> ,Z, be а sequence of jointly distributed 
random variables; Z, converges to the random variable Z with probability 
one if and only if for every € > 0 


(1.3) lim ‚| (ор s = a) > e] =0. 
N— o n>N 
On the other hand, the sequence {Z,,} converges to Z in probability if and 


416 SEQUENCES OF RANDOM VARIABLES cH. 10 


only if for every є > 0 (1.1) holds. Now, it is clear that if |Zy — Z| > e, then 
ѕир |2, — Z| > є. Consequently, 


nzN 


P[Zx —Z| є < Pl sup |Z, -2|> є, 
nc 


and (1.3) implies (1.1). Thus, if Z, converges to Z with probability one, it 
converges to Z in probability. m 

DM aan with probability one of the sequence {Z,,} to Z implies that 
one can make a probability statement simultaneously about all but a finite 


number of members of the sequence {Z,,}: given any positive numbers є 
and б, an integer N exists such that 


(4) PlIZy — 2| < &IZy — 7| <6, |2 -Z| < s cs Srl, 20 


On the other hand, convergence in probability of the sequence (Z,) to Z 
implies only that one can make simultaneous probability statements about 
each of all but a finite number of members of the sequence (Z,): given 
any positive numbers e and ó an integer N exists such that 


(1.5) РЦ —z| c 9 1 — 8, РЇ из — Z| e] 5-1 — 


PllZvas —2|< J 1—5,--.. 


One thus sees that conver 
Convergence with probability 
However, without additional 


gence in probability is implied by both 
one and by convergence in mean square. 
conditions, convergence in probability 
Bence in mean square nor convergence with 


probability one. Further, Convergence with probability one neither implies 


nor is implied by Convergence in mean square. 
The following theorem Bives a conditio. 
mean square implies convergence with pro 


THEOREM 1А. If a sequence Z, converges in mean square to 0 in such 
a way that 


(1.6) 


n under which convergence in 
bability one. 


> E[Z,?] < co, 
n=1 


then it follows that Z, converges to 0 with probability one. 
Proof: From (1.6) it follows that 


pd |$, 23} E Ў mza inda 


since it may be shown that for an 


infinite series of nonnegative summands 
the expectation of the sum is equ 


al to the sum of the expectations. Next, 


SEC. 2 THE LAW OF LARGE NUMBERS 417 


E 
from the fact that the infinite series У, Z,? has finite mean it follows that it 
n=1 


is finite with probability one; in symbols, 


(1.8) po<3ze<=| =1. 


п=1 


If an infinite series converges, then its general term tends to 0. Therefore, 


from (1.8) it follows that 


(1.9) P| lim 2% =0| =1. 

т-» o 
omplete. Although the proof of theorem 1A 
for its justification two basic facts of the 
lity spaces that have not been established 


The proof of theorem 1A is c 
is completely rigorous, it requires 
theory of integration over probabi 
in this book. 


2. THE LAW OF LARGE NUMBERS 


The fundamental empirical fact upon which are based all applications of 
the theory of probability is expressed in the empirical law of large numbers, 
first formulated by Poisson (in his book, Recherches sur le probabilité des 


jugements, 1837): 


In many different fields, empirical phenomena appear to obey a certain general 


law, which can be called the Law of Large Numbers. This law states that the 
ratios of numbers derived from the observation of a very large number of similar 
events remain practically constant, provided that these events are governed partly 
by constant factors and partly by variable factors whose variations are irregular 
and do not cause a systematic change in a definite direction. Certain values of 
these relations are characteristic of each given kind of event. With the increase 
in length of the series of observations the ratios derived from such observations 
Come nearer and nearer to these characteristic constants. They could be expected 
to reproduce them exactly if it were possible to make series of observations of an 
infinite length. 
In the mathematical theory of probability one may prove a proposition, 
called the mathematical law of large numbers, that may be used to gain 
insight into the circumstances under which the empirical law of large 
numbers is expected to hold. For an interesting philosophical discussion 
of the relation between the empirical and the mathematical laws of i 
numbers and for the foregoing quotation from озон the reader shou : 
consult Richard von Mises, Probability, Statistics, and Truth, secon 
revised edition, Macmillan, New York, 1957, pp. 104-134. 


418 SEQUENCES OF RANDOM VARIABLES сн. 10 


joi istri iables, Y, Аз. dp 
nce of jointly distributed random variables, X}, X, { 
dn с. » said to obey the (classical) law of large numbers if 


Ao Gte b, BOR RR E 


> 0 
бу ж, 


п п 


necessarily identically distributed, random 
if, for some 6 > 0 


1 2 
(2.2) lim „тїй 2. ЕЦ Eix] + = 0 
n= o "Оо k=l 
then 
(2.3) 


plim У (х, — etx) o, 


N+ oa NESI 


1 "n 
Q4) C = HZ,X]) =- Y gx y 
k=1 


between the nth summand X, and the nth sample mean 


Za TEX, + ®%®,+-.. + Хп. 


SEC. 2 THE LAW OF LARGE NUMBERS 419 


Let us examine the possible behavior of C, under various assumptions 
on the sequence (X, ) and under the assumption that the variances Var [X,] 
are uniformly bounded; that is, there is a constant M such that 


(2.5) c,?— Var[X,] X M гап. 


If the random variables (X,) are independent, then E[X,X,] = 0 if 
К <n. Consequently, C, = е„?/п, which, under condition (2.5), tends to 
0 as л tends to оо. This is also the case if the random variables {X,,} are 
assumed to be orthogonal. The sequence of random variables (X, ) is said to 
be orthogonal if, for any integer k and integer m 40, E[X,X,,.,] = 0. 
Then, again, C, = c.?[n. 

More generally, let us consider random variables (X, that are stationary 
(in the wide sense); this means that there is a function R(m), defined for 
m —0,1,2,...,such that, for any integers k and т, 


(2.6) ELX: Xrm] = К(т). 


It is clear that an orthogonal sequence of random variables (in which all 
the random variables have the same variance o?) is stationary, with 
R(m) = o? or 0, depending on whether т = 0 or m > 0. Fora stationary 
sequence the value of C, is given by 


п-1 


1 
(2.7) C, = 5 15,80). 


We now show that under condition (2.5) a necessary and sufficient 
condition for the sample mean Z, to converge in quadratic mean to 0 is 
that C, tends to 0. In theorem 2B we state conditions for the sample 
mean Z, to converge with probability one to 0. 


THEOREM 2A. А sequence of jointly distributed random variables 
{Х„} with zero mean and uniformly bounded variances obeys the quadratic 
mean law of large numbers (in the sense that lim E[Z,?] = 0) if and only if 


n— 0 


(2.8) lim C, — lim E[X,Z,] = 0. 
n> © п» = 
Proof: Since EX,Z,] < ELX,7JEZ,"], it is clear that if the quadratic 
mean law of large numbers holds and if the variances E[X,,2] are bounded 
uniformly in л, then (2.8) holds. To prove the converse, we prove first the 


following useful identity: 


" 1 n " _ 2 n ? 
(2.9) EIZ, + = pu 1= = P: E[X,Zj]. 


420 SEQUENCES OF RANDOM VARIABLES CH. 10 


To prove (2.9), we write the familiar formula 


n n k—1 
010) ЕЮ +-+ + Xx, = > EUG] + 22 E FX 


k=1 


=25 SEX] -È rixa 
E-lj-l m 


=1 
=2 > KE[X,Z,] - ЕХ], 
k=1 t=] 


from which (2.9) follows by dividing through by л?. In view of (2.9), to 


complete the proof that (2.8) implies E[Z,2] tends to 0, it suffices to show 
that (2.8) implies 


n 


1 
(2.11) lim = È kC, = 0. 


n-o Н KZI 


To see (2.11), note that for any n> N90 


1 2 1 х 

(2.12) px = PA Tau IG: 

Letting first n tend to infinity and then № tend t 

(2.11) holds. The proof of 
If it is known that LE: 

conclude that convergence 


О 00 in (2.12), we see that 
theorem 2A is complete. 


ends to 0 as some power of n, then we can 
holds with probability one. 


2B. A sequence of jointly dist 
TO mean and uniform] 


law of large numbers 


ributed random variables 
у bounded variances obeys the strong 


in the sense that P| lim Z,-0|- 
\ n= о 
constants M and q exist such that for all integers n 


THEOREM 
{Xn} with ze 
1) if positive 


Q.13) IEZI = Ic, < М. 
n 
Remark: For а stationary sequence of random variables [in which case 


C, is given by (2.7)] (2.13) holds if Positive constants M апа q exist such 
that for all integers m > 1 


(2.14) 


Proof: If (2.13) holds, then (assuming, as we may, that 0 — 4<1) 
1 X М x M [nti 4M 1 

2 Се Eu em 
(2.15) п? 2, BEES af & nm 


21-4 dy < s 
"rw 


SEC. 2 THE LAW OF LARGE NUMBERS 421 


By (2.15) and (2.9), it follows that for some constant M’ and q > 0 


M 
(2.16) B[Zy] S$—- for all integers л. 
n 


Choose now any integer r such that r > (1/4) and define a sequence of 
random variables Z,', Zg’, . .. , Zm by taking for Z,,' the m’th member of 


the sequence (Z,); in symbols, 


(2.17) zu. 25и 
By (2.16), the sequence (Z,,'] has a mean square satisfying 
К. M' 
(2.18) E[Z4^] < ET 
If we sum (2.18) over all m, we obtain a convergent series, since rg > 1: 
(2.19) Y EIZ,"] € М'У m^" < co. 
n=l m=1 
Therefore, by theorem 1A, it follows that 
(2.20) ‚| lim Z„’ = o] = e| lim 2, = o] =. 
т» oo mi— © 


We have thus shown that a properly selected subsequence {Z,,r} of the 
sequence {Z,,} converges to 0 with probability one. We complete the proof 
of theorem 2B by showing that the members of the sequence {Z,,}, located 
between successive members of the subsequence {Zr}, do not tend to be 
too different from the members of the subsequence. More precisely, define 


n 
О„ = тах = 2.) 
mt <п<(т+ 1)" т 
ù 
(2.21) Vn = max Z = —Z, 
mt Zn «(n4 1) m 
Wn = max IZ, — Zl 


m 
т" <п<(т+1)" 


We claim it is clear, in view of (2.20), that to show that ‚| Нат o] E 


n= © 


it suffices to show that 7 lim #„=0|= 1. Consequently, to complete 


т» oo 


the proof it suffices to show that 


lim U,, -e| = e| lim Vm =0] = 1. 
т— 90 


т oo 


(2.22) ‚| 


422 SEQUENCES OF RANDOM VARIABLES cH. 10 


In view of theorem 1A, to show that (2.22) holds, it suffices to show that 


Q23) Š EU, <o, FEV, <x, 
1 


m= m=1 


We prove that (2.23) holds by showing that for some constants М, and Mp 


My "mS | 
(2.24) E“{U,,2] < E » EAVIS = for all integers jn. 


To prove (2.24), we note that 


] 1-1 
<= Ху. 
(2.25) mb im Wl 


from which it follows that 


Је (т + 1)" —mr 

A[U n] < — БИГ] oom a 
Gu) EMUS т" ка" Ds т" 

in which we use M as a boun 
the law of the mean, one ma 


> 


d for E*[X,?. By a calculus argument, using 
y show that for r > 1 and т2 1 


ү 1 
(2.27) ( t +) -1z-rpr, 
m m 
Consequently, (2.26) implies the first part of (2.24). Similarly, 
l7 ] m+- 
(2.28) ras *z)-m X au, 
m т k=l 
1 1 m+- 
(2.29) PMA s (Frat) LO a, 
m т at 


from which one may infer the second part of (2.24), The.proof of theorem 
2B is now complete. 


EXERCISES 


2.1, Random digits. Consider a discrete random variable x uniformly 
distributed over the numbers 0 to N — I for any integer № > 2; that is, 
PIX =k] =1/N if k =0, L2,..., N — 1. Let {Xn} be a sequence of 
independent random variables identic 


ally distributed as Y, For an integer 
k from 0 to N — | define Ғ.К) as the fraction of the observations 
ХХ... X, equal to k. Prove that 


; 1 
P| im Fw = q “i 


SEC. 


22. 


2.3. 


2.4. 


2.5. 


2 THE LAW OF LARGE NUMBERS 423 


The distribution of digits in the decimal expansion of a random number. 
Let Y be a number chosen at random from the unit interval (that is, Y is 
a random variable uniformly distributed over the interval O to 1). Let 


Ху, Xa ... be the successive digits in the decimal expansion of Y; that is, 
х X Xn 
Y To * 10: * + Tor + 
Prove that the random variables Xj, Х,... are independent discrete 


random variables uniformly distributed over the integers 0 to 9. Conse- 
quently, conclude that for any integer (say, the integer 7) the relative 
frequency of occurrence of k in the decimal expansion of any number Y 
in the unit interval is equal to yg for all numbers Y, except a set of numbers 
Y constituting a set of probability zero. Does the fact that only 3's occur 
in the decimal expansion of } contradict the assertion? 


Convergence of the sample distribution function and the sample characteristic 
function of dependent random variables. Let Xj, Yo,..., Xn be a sequence 
of random variables identically distributed as a random variable X. The 
unction F,(y) is defined as the fraction of observations 


sample distribution fi 
among X, Xs... +» 4 Y, which are less than or equal to y. The sample 


characteristic function ф„(и) is defined by 


2 i X En 
galt) = Ме] = - У еїиХү, 
nm 


Show that F,(y) converges in quadratic mean to Fy(y) = P[X < y], as 
n — co, if and only if 


12 | А Р 
iN рх < у, Xn € 9] ЕХО. 
(2.30) nd LX: ' x 
Show that ¢,(u) converges in quadratic mean to x(u) = Efe) if and 
only if 
1%. йсй i 
(2.31) C, =- Y Бен] — |b)? 
š nata 


Prove that (2.30) and (2.31) hold if the random variables Xj, Xə... are 


independent. 
e numbers does not hold for Cauchy distributed random 
Leod ev Xs... X, be a sequence of independent identically 
cae meV h probability density functions fy, (x) = 


istri andom variables wit 1 
ed Show that no finite constant т exists to which the sample 
T E . 


means (X, +... + Х„)[п converge in probability. 


nce of independent random variables identically dis- 
le X with finite mean. Show that for any 


fo) of a real variable г 


Let {Xn} be a seque à 
tributed as a random variab 


bounded continuous function 
lim sp (52775) = (EL). 


n 


(2.32) 


п— 


424 SEQUENCES OF RANDOM VARIABLES cH. 10 


Consequently, conclude that 


1 lofm 4] +24 daro T 
Q.33) tim P [ret d: dv, = fd) 


Nn 


z к\ үп . 
li Е ( Joa —0"® =f), OK<1<1. 
(2.34) Jim. 2 (7) k 
2.6. A probabilistic proof of Weierstrass’ theorem: Extend (2.34) to show that to 

| any continuous function f( on the interval 0 <1 <1 there exists a 


sequence of polynomials P,(r) such that lim P.(r) = f(t) uniformly on 
Osr<l. mo 


3. CONVERGENCE IN DISTRIBUTION OF А SEQUENCE 
OF RANDOM VARIABLES 


In this section we define the notion of convergence in distribution of a 
sequence of random variables Z,, Zo... , Z„ to a random variable Z, which 
is the notion of Convergence most used in applications of probability 
theory. The notion of convergence in distribution of a sequence of random 
variables can be defined in a large number of equivalent ways, each of 
Which is important for certain purposes. Instead of choosing any one of 


them as the definition, we prefer to introduce all the equivalent concepts 
simultaneously. 


THEOREM 3A. DEFINITIONS AND THEOREMS CONCERNING CONVERGENCE 
IN DISTRIBUTION. For n = 1,2,... let Z, be а random variable with 
distribution function Е, () and characteristic function $z C). Similarly, 
let Z be a random variable with distribution function F,(-) and character- 
istic function ¢,(-). We define the sequence {2,) as converging in 
distribution to the random variable Z, denoted by 


(3.1) lim £(Z,) =£(Z), ог L(Z,) — L(Z), 

n— со 
and read "the law of Z, converges to the law of Z" if any one (and 
consequently all) of the following equivalent Statements holds: 


(i) For every bounded conti 
is convergence of the ex 
to oo, 


(3.2) 


nuous function g(-) of a real variable there 
pectation E[e(Z,)] to E[g(Z)]; that is, as n tends 


E[g(Z,)] = | ЕС dF; (2) —- | 80) dF,(2) = E[g(Z)]. 


(ii) At every real number u th 
functions; that is, as л tends to 


(3.3) Ele"; 


ere is convergence of the characteristic 
оо, 


= $z (и) = фи) = Efe), 


SEC. 3 CONVERGENCE IN DISTRIBUTION 425 


(iii) At every two points a and b, where a < b, at which the distribution 
function F,(-) of the limit random variable Z is continuous, there is 
convergence of the probability functions over the interval a to b; that is, 


as n tends to oo, 
(3.4) Pla<Z, X b] = Fz (P) — Е, (a)— Fz(b) — F,(a) = P[a < Z € b]. 
(iv) At every real number a that is a point of continuity of the 


distribution function F,(-) there is convergence of the distribution 
functions; that is, as tends to oo, if a is a continuity point of FC). 


P[Z, < a] = Fz (a) 7 Еа) = Р[2 < а). 
(v) For every continuous function g(-), as л tends to oo, 
Pries в) <У = Fuz) — Ел) = PAG: 86) < y) 


at every real number y at which the distribution function Fg) is 


continuous. 

Let us indicate briefly the significance of the most important of these 
statements. The practical meaning of convergence in distribution is 
expressed by (iii); the reader should compare the statement. of the central 
limit theorem in section 5 of Chapter 8 to see that (iii) constitutes an exact 
mathematical formulation of the assertion that the probability law of Z 
"approximates" that of Z,. From the point of view of establishing in 
practice that a sequence of random variables converges in distribution, one 
uses (ii), which constitutes a criterion for convergence 1n distribution in 
terms of characteristic functions. Finally, (у) represents а theoretical fact 
of the greatest usefulness in applications, for it asserts that if Z,, converges 
in distribution to Z then a sequence of random variables g(Z,,), obtained 
as functions of the Z,,, converges in distribution to g(Z) if the function gl) 
is continuous. . 

of these statements to section 5. 


We defer the proof of the equivalence n ! 
The Continuity Theorem of Probability Theory. The inversion formulas 


of section 3 of Chapter 9 prove that there is a one-to-one correspondence 
between distribution and characteristic functions; given a distribution 


function F(:) and its characteristic function 


(3.5) фи) = | J Lario 


nction of which ¢(-) is the characteristic 
theorem 3A show that the one-to-one 
Correspondence between distribution and characteristic functions, mine 
as a transformation between functions, is continuous in the sense t Р E 
sequence of distribution functions F,,(-) converges to а distribution functio 


there is no other distribution fu 
function. The results stated in 


426 SEQUENCES OF RANDOM VARIABLES cH. 10 


F(-) at all points of continuity of FÇ) if and only if the sequence of 
characteristic functions 


(3.6) ф„(и) =f” e'* AF (2) 


converges at each real number u to the characteristic function $C) of FO). 
Consequently, theorem 3A is often referred to as the continu 
probability theory. | | | | 

Theorem 3A has the following extremely important extension, of which 
the reader should be aware. Suppose that the sequence of characteristic 
functions ¢,(-), defined by (3.6), has the property of converging at all real u 
to a function 4(-), which is continuous atu= 
is then a distribution function F(-), of which 
In view of this fact, the continuit 
sometimes formulated in the following way: 


Consider a sequence of distribution functions F(x), with characteristic 


Junctions $,(u), defined by (3.6). In order that a distribution function FÇ) 
exist such that 


ity theorem of 


0. It may be shown that there 
$C) is the characteristic function. 
y theorem of probability theory is 


lim F,(x) = F(x) 


— 
at all points x, which are continuity 


points of F(x), it is necessary and 
sufficient that a function $(u), continuo 


из at u = 0), exist such that 
lim (u) = (u) at all real u. 


n= © 


LEMMA 3A. Let Y be a ra 


ndom variable whose mean E[X] exists and 
is equal to 0 and whose varian 


ce [X] = ELY?]is finite. Then (i) for any u 
1 

(3.7) Фу(и) =1— heE[x?] — ef di(Y — NEX {etx _ D- 
0 

(ii) for any u such that 3wE[X?] < 1, log ф 


(3.8) log ф (и) = —á&eE[x?] — el di(V — NEX pix — D] 
0 


+ 30и®%Е?[Х?] 


x(u) exists and satisfies 


SEC. 3 CONVERGENCE IN DISTRIBUTION 427 


for some number 0 such that |0| < 1. Further, if the third absolute 
moment E[| X|?] is finite, then for u such that 312E[X?] < 1 


(3.9) logésG) = — $ eL] + 2 МЕП + SOlu] °C 


Proof: Equation (3.7) follows immediately by integrating with respect 
to the distribution function of X the easily verified expansion 


1 
(3.10) ейт = 1 + iux — hu? — wef dt(1 — rye" — 1). 
0 
To show (3.8), we write [by (3.7)] that log Ф (и) = log (1 — r), in which 
1 " 
(3.11) r = pabE(X?] + e| dil — HEX Xe" — 1). 
0 


Now |r|  3:2E[X?]/2, so that |r| < } if u is such that 3i? E[X?] < 1. For 
any complex number r of modulus |r| < 4 
to] 
log(1 —7) — e| dt, 
bog 
(3-12) logg(1—r +r = -ef get dt, 
Пов (1 — 0) — (701 € Ir? < СЕ), 
since |l — rt| > 1 — [rt] >}. The proof of (3.8) is completed. 
Finally, (3.9) follows immediately from (3.8), since 
2 ах (iu)? 'a 1 АРШ 
[аа — ote —D] = J, a — PEIX E) 


LEMMA 3B. In the same way that (3.7) and (3.8) are obtained, one may 
Obtain expansions for the characteristic function of a random variable Y 


whose mean £[Y] exists: 


(313) y(u) = 1 + ШЕУ] + iu Í dtELY(e" — 1)] 
1 iut Y 
log фу) = MELY] + iu f etit — 10) + set YN 


for u such that 6ļul E[] YI] < 1. 

> Example ЗА. Asymptotic normality of binomial random variables. In 
section 2 of Chapter 6 it is stated that a binomial random variable is 
approximately normally distributed. This assertion may be given a precise 


428 SEQUENCES OF RANDOM VARIABLES cH. 10 


formulation in terms of the notion of convergence in distribution. Let 5, 
"pron number of successes in n independent repeated Bernoulli trials, with 
probability p of success at each trial, and let 


z _ 8а — HS] S,—m 
ao ur EE 


Let Z be any random variable that is normall 
variance І. We now show that the sequence 
to Z. To prove this assertion, we first write 
Z, in the form 


у distributed with mean 0 and 
{Zn} converges in distribution 
the characteristic function of 


G19 фа (0) = exp [—iulnp/ Ving), ( zz) 


= [g exp (>iuv plng) + p exp (iu V/qnp)]". 
Therefore, 


(3.16) log $7 (и) =n log ф (и), 
Where we define 


(3.17) Фх(и) = q exp (—iuv plng) + p exp (iuV/q[np). 


Now ф (и) is the characteristi 


€ function of a random variable X with 
mean, mean square, and abso] 


ute third moment given by 
FU) = (V ping) + pVainp = 0, 


(3.18) 


> 


ВА? = (Урд)? + роу аар) = "+4 Є : 
FIXI = d|— V pingi + рл -LEP 


(pg) 
By (3.9), we have the expansion for log $x(u), valid for u, such that 
Зи?Е[Х°] = Зип ESTIS 


G19) lop bx = — lrg 4 c MEILI + зде 


=- за ба + * 


1 
RES 
m” tg зрух + 30h, 


in which 0 is some number such that J0| « 1. 


SEC. 3 CONVERGENCE IN DISTRIBUTION 429 


In view of (3.16) and (3.19), we see that for fixed и z 0 and for n so 
large that n > 3:2, 


1 0 ,d t p 30 
2 = – и физ pod io 
(3.20) log ф (и) 5“ + 6“ (ори = 
which tends to log $z(u) = — 118 as n tends to infinity. By statement (ii) 


of theorem ЗА, it follows that the sequence {Z,,} converges in distribution 


to Z. 4 


Characteristic functions may be used to prove theorems concerning 
convergence in probability to a constant. In particular, the reader may 


easily verify the following lemma. 

LEMMA 3C. A sequence of random variables Z, converges in probability 
to 0 if and only if it converges in distribution to 0, which is the case if and 
only if, for every real number и, 

(3.21) lim фу (и) = 1. 
n= со 

THEOREM 3B. The law of large numbers for a sequence of independent, 
identically distributed random variables Xy, Xo, . . . , Х„ with common finite 
mean m. As n tends to оо, the sample mean Qn, "bene X,) con- 
verges in probability to the mean m = E[X], in which X is a random 
variable obeying the common probability law of Xy, Xo, .. . Xp 

Proof: Define Y — X — E[X] and 


z,-lg +++) — ЕЈ], 
n 


e mean (1/n)(X; + Xs +... + Х„) converges in 
E[X], it suffices to show that 2, converges in 
for a given value of и and for п so large that 


To prove that the sampl 
probability to the mean 
distribution to 0. Now, 
n> 6[u|E[] YT] 


и 
(3.22) log ф,(0 = log $y (©) 


1 ФЕР, и? 
= i [ dre Y (e — 1)] + 90 mE vi} ; 
> 
Which tends to 0 as n tends to co, since, for each fixed t, u, and y, е" 
tends to 1 as n tends to со. The proof is complete. 


430 SEQUENCES OF RANDOM VARIABLES cH. 10 


EXERCISES 
3.1. Prove lemma 3C. 


3.2. Let Xi, X»..., X, be independent random variables, each assuming 


each of the values +1 and —1 with probability }. Let Y, = У ¥,/2%. 


j=1 
Find the characteristic function of Y, and show that, as п tends to оо, for 


each и, $y, (u) tends to the characteristic function of a random variable Y 
uniformly distributed over the interval 0 to 1. Consequently, evaluate 
Р[-2 < Y, <}, P < Y, <] approximately. 


Let X, 35, ..., X, be independent random variables, identically distri- 
buted as the random variable X. Forn 21,2 


52, ...,let 
S, — E[S,] 
р ашы „| = ey ie X 
2 «Sj ° $„ =X, + Xa + 


- 
Assuming that X is (i) binomial distributed with parameters п = 6 апа 
P = i, (ii) Poisson distributed with parameter 2 = 2, (iii) 7? distributed 
with v = 2 degrees of freedom, for each real num 


3.3 


n= o 
< 20] approximately. 
34. For any integer r and 0 < 


› lim dep yq iyu) = $2(u). State in words the meaning of this 
result. p—0 


3.5. Let Z, be binomial distributed with parameters л and p = Ап, in which 


} > 0 is a fixed constant. Let Z be Poisson distributed with parameter 4. 
For each u, show 


that lim dz (и) = z(u). State in words the meaning 
of this result. пле 


3.6. Let Z be a random variable Poiss 


on distributed with parameter 4, By use 
of characteristic functions, show t 


hat as 2 tends to co 
«(E + AY) 
vi 
in which Y is normally distributed w 
3.7. Show that plim x, = Xim 


N+ 


ith mean 0 and variance 1. 
plies that lim P(X.) = AX). 


т.200 


4. THE CENTRAL LIMIT THEOREM 


A sequence of jointly distributed random variables Xy Xo... X, with 


finite means and variances js Said to obey the (classical) central limit 
theorem if the sequence Z,,Z2,...,Z,, defined by 


5, — E[S,] 
(4.1) mms c E id XQ X 


n? 


SEC. 4 THE CENTRAL LIMIT THEOREM 431 


converges in distribution to a random variable that is normally distributed 
with mean 0 and variance 1. In terms of characteristic functions, the 


sequence (X ,] obeys the central limit theorem if for every real number u 


(4.2) lim dz (и) = e-. 
n-- = 
The random variables Z}, Zə ...,Zņ„ are called the sequence of 


normalized consecutive sums of the sequence Xj, X, .. ., Х,. 

That the central limit theorem is true under fairly unrestrictive conditions 
on the random variables X}, №, ... was already surmised by Laplace and 
Gauss in the early 1800’s. However, the first satisfactory conditions, 
backed by a rigorous proof, for the validity of the central limit theorem 
were given by Lyapunov in 1901. In the 1920's and 1930's the method of 
characteristic functions was used to extend the theorem in several directions 
and to obtain fairly unrestrictive necessary and sufficient conditions for its 
validity in the case in which the random variables Xj, Xo, .. . are indepen- 
dent. More recent years have seen extensive work on extending the central 
limit theorem to the case of dependent random variables. 

The reader is referred to the treatises of B. V. Gnedenko and A. N. 
Kolmogorov, Limit Distributions for Sums of Independent Random 
Variables, Addison-Wesley, Cambridge, Mass., 1954, and M. Loéve, 
Probability Theory, Van Nostrand, New York, 1955, for a definitive 
treatment of the central limit theorem and its extensions. 

From the point of view of the applications of probability theory, there 
are two main versions of the central limit theorem that one should have at 
his command. One should know conditions for the validity of the central 
limit theorem in the cases in which (i) the random variables X, Xs. T 
are independent and identically distributed and (ii) the random variables 
Xy, Xy, . . . are independent but not identically distributed. 


THEOREM 4А. THE CENTRAL LIMIT THEOREM FOR INDEPENDENT IDENTI- 
CALLY DISTRIBUTED RANDOM VARIABLES WITH FINITE MEANS AND VARIANCES. 
.. let X, be identically distributed as the random variable 
an E[X] and standard deviation o[X]. Let the sequence 

(4.1) or, more explicitly, 


For n = 1, 2,. 


X, with finite me 
{Xn} be independent, and let Z,, be defined by 


(4.3) d.c Vina X] 


Then (4.2) will hold. 

THEOREM 4B. THE CENTRAL LIMIT THEOREM FOR INDEPENDENT RANDOM 
VARIABLES WITH FINITE MEANS AND (2 + 6)th CENTRAL MOMENT, FOR SOME 
ó—0. Forn = 1,2 let Y, bea random variable with finite mean Е[Х„] 

E Orme ы n 


246 
and finite (2 + 6)th central moment uQ + in) = EX, — ЕХ?" 1. 


432 SEQUENCES OF RANDOM VARIABLES cH. 10 


Let the sequence {¥,} be independent, and let Z, be defined by (4.1). 
Then (4.2) will hold if a 


n 


(4.4) lim ет 8 2 #(2 + ò; k) =0, 


n— со ES] k= 


We turn now to the pro 
independent random variables X,, X5,... rÁ 
the random variable X, with mean 0 and va 
normalized sum, given by ( 
written 


ey ' 46 |, Gal 


Now o[S,] = V'no[X] tends to co as n tends to co. Therefore, for each 


fixed u, log $x(u/o[S,]) exists (by lemma ЗА) when n > 34%, For n as 
large as this, using the expansion given by (3.8), 


102 за fi ЖИК i 
(4.6) log $y (и) = n|- л | й(1 — Ере vno. 1)] 


» identically distributed as 
riance o?. Let Z, be their 
4.1). The characteristic function of Z, may be 


not 


4 
4p gg. o) : 
Theorem 4A will be 


Proved if we prove that 
(4.7) 


lim log Фе (и) = =j, 


(48) E[gQX, t, u)] =|, 80 b u) аР x(a) + gr, t, и) dF (2). 
iz| < Л >м 


121 


d M > 0 and real numbers 
u and t 
M\ut 
it РЕ еы с-з ш |” dF x(@). 
Then 


Мц 


So af 2 dF x(a), 
Va ieee х@ 


(4.10) | Í HCO — DELEC, t, 1) 


SEC. 4 THE CENTRAL LIMIT THEOREM 433 


which tends to 0, as we let first л tend to oo and then M tend to co. The 
proof of the central limit theorem for identically distributed independent 
random variables with finite variances is complete. 

We next prove the central limit theorem under Lyapunov's condition. 
Fork = 1,2,...,let Y, bea random variable with mean 0, finite variance 
o,°, and (2 + 6)th central moment «(2 + б; k). We have the following 
expansion of the logarithm of its characteristic function, for и such that 
Зи о? 7: 

(4.11) 1овфу(и) = — о + 20u + (2 + 0; k) + 30uto,*. 


To prove (4.11), merely use in (3.8) the inequalitý je’ — 1| < 2|w[?, valid 
for any real number w and 0 < б<1. " 
Now, (4.4) and theoretical exercise 4.3 imply that 


o? ү%+®® (2+ 9; k) 
ea (ди) inm eel) 


1zkzn o[S,,]/ 
Then, for any fixed u it holds for n sufficiently large that 3:?o;?/o*[S,] < 1 
for all k = 1,2,...,”. Therefore, log Ф: (и) exists and is given by 
2 u 1 x. бы @ 
(4.13) log bz, (и) = X log bx, (s) == P3 AS] 
LU H n 
"VTL MUR : 
240 y D H 3048 — — 2, о“. 
+ 2002 aus] ар," 
The first sum in (4.13) is equal to 1, whereas the second sum tends to 0 by 


Lyapunov's condition, as does the third sum, since 


The proof of the central limit theorem under Lyapunov's condition is 


complete. 


THEORETICAL EXERCISES 


al limit theorem holds for independent random variables 
E ee E Dh nemi means and finite variances obeying Lindeberg's 
condition: for every є > 0 
n 
ИЙ 5 x? dFy,(v) = 0. 
(4.14) lim 3S3 2, | А 


n= б 1512015,1 
+ " X by 

i. = co[S,], replacing сУп by e[S,]. Obtain thereb 
Hints In (43) 11 M iod = 1)]. Add these estimates to „obtain 


an estimate for log $z,(«), as in (4.13). 


| 434 SEQUENCES OF RANDOM VARIABLES cu. 10 


"kov". iti int: Adapt 
the law of large numbers under Markov's condition. Hint а 
em = med of debi limit theorem under Lyapunov's condition, using 
the expansions (3.13). 
i i i iable, and 
3. Jensen's inequality and its consequences. Let x bea random variable, 

ы let 7 be a (possibly infinite) interval such that, with probability one, X takes 
its values in Z; that is P[X lies in /] = 1. Let g(-) bea function of a real 
variable that is twice differentiable on / and whose second derivative satisfies 
g"(x) > 0 for all x in Z. The function gC) is then said to be convex on I. 
Show that the following inequality (Jensen’s inequality) holds: 

(4.15) &#СЕ[Х]) < Е[е(Х)]. 

Hint: Show by Taylor's theorem that g(x) > gc) + FACOG = хо). Let 
ж = E[X]and take the expectation of both sides of the inequality. Deduce 
from (4.15) that for anyr > Land s > 0 

(4.16) LEIF < EXN < ЕХ] 

(417) EXI] < ЕХ], 

Conclude from (4.17) that if 0 < гү < ra then 


(4.18) ЕНТАХ А) < Ен Хз], 
In particular, conclude that 


(4.19) E(X|] < E'sf y] < E'sI xps] <-+.. 


44. Let {U,} be a Sequence of independent random variables, each uniformly 
distributed on the interval 0 to 7. Let {4,} be a sequence of positive 
constants. State conditions under which the Sequence X, = An cos Un 
obeys the central limit theorem, 


5. PROOFS OF THEOREMS CONCERNING 
CONVERGENCE IN DISTRIBUTION 


In this section we prove the 
showing that each implies its successor. F iti 


on of 2, 
To prove that (ii) implies (iii), we ma 


ke use of the basic formula (3.6) of 
Chapter 9. For any d > 0 define the fu 


nction e (-) for any real number z by 
202) = 1 ifa cz 


a—z i 
=1-( ) ifa—~d<z<q 
d Sar eS 


z—b ; 
=]— 7) ifb<z<b4a 


otherwise. 


Ш 
© 


SEC. 5 PROOFS OF THEOREMS 435 


The function g,(-) is continuous and integrable. Its Fourier transform y,(-) 
is given for any u by 


I P" i CER 
G) м = = [еә d) е0 ё. 
Therefore, ' 
(5.2 1 [ TA [" Ж. 
- ya) = 2miud ( uk ТШ e ) 
= 1 (gue a gin -0 ма eg 0d) + 2—10). 
2nd 


Thus we see that the Fourier transform y,(:) is integrable. Consequently, 
from (3.6), of Chapter 9 we have - 


G3 [7 dEn- IN arate = [testo — bole 


By letting n tend to со in (5.3) and using the hypothesis of statement (ii), 
we obtain for any d > 0, as л tends to co, 


a) [ * gi dFz,()—> | gc) 48,0). 
Next, define the function g,*() for any 2 by 
ge) = 1 ifat+d<z<b-d 
es 5) ifa<z<atd 
a 


= ex ifb—dzzzb 
ü 

=0 otherwise. 

prove that (5.4) holds for g,*(-). 


By th i ment, one may 
p (C) and g,*() clearly straddle the 


Now, the expectations of the functions gu 
quantity F; (b) — Е, (а): | 
(5.5) Г g,*(G) dFz,@) < Fz,() — Е, (а) < À E dF; (г). 


From (5.5), letting п tend to 00, We obtain 


(5.6) | * НС) dF (2) < lim inf Fz, (0) — Ку.) 


< lim sup Fz,(6) — Fz (a) 


< | * ge) 49). 


436 SEQUENCES OF RANDOM VARIABLES cH. 10 
Now, let d tend to 0 in (5.6); since 


0x r40 — Fo) — |^ eG dF fe) < Fela + 4) 
| — Fa) + F(b) — ЕДЬ — d) 0, 
S:T © 
ED 0 <Í 842) dFz(2) — [Fz(b) — Fz(a)] < Еа) 

С — Еда — d) + ЕДЬ + d) — Fg(b) > 0. 
as d tends to 0, it follows that (3.4) holds. Note that (5.7) моша not hold 
if we did not require a and b to be points at which ЕС) is continuous. " 

We next prove that (iii) implies (iv). Let M bea positive number suc 
that F(-) is continuous at M and at —M. Then, for any real number а 
(5.8) |F,(a) — F(@)| < |F,(a) — F,(—M) — F(a) + F(—M)| 

+ F,(—M) + F(—™). 
Since statement (iii) holds, it follows that if a is a continuity point of F() 


(5.9) lim sup |F,(a) — F(a)| < F(—M) + lim sup F,(—M). 


Now, also by (iii), since F (M) — F,(—M) tends to F(M) — F(—M), 
(5.10) lim sup Е(— М) < lim sup (1 — F,(M) + F,(—M)) 


<1 — F(M) + F(—M). 
Consequently, 


(5.11) lim sup |F, (a) — F(a)| < 2F(—M) + 1 — F(M), 


which tends to 0, as one lets M tend to co. The proof that (iii) implies (iv) 
is complete, 

We next prove that (iv) im 
continuous on a closed intery 


> M], g(-) is continuous. Fix e > 0, and let d(e) 
going sentence, We may then choose (K 4- 1) 
real numbers ay, %,..., Gy having these properties: (i) — M = dy < а < 
Sar = М, (ii) а-а. <) for k— 1,2,...,K, (ii) for 
k =1,2,..., К, F() is continuous ata. Then define a function g(-; e, M): 
(5.12) g(;e M) 20 if [|> M, 


=8(0) Ка, <=<а, for some k = 1,2,*::, K, 
= g(—M) if z = —M, 


SEC. 5 PROOFS OF THEOREMS 437 
It is clear that for |x| < M 

(5.13) g(x) — g(a; e, M)| € є. 

Now 


вл) | [^ 0) ars) – | sano] ы + Wal + Uh 


where Le | i [g@) — g(x; e, М) аР, (х) 
I =|" [g(z) — gla; e, M) аа) 


pA -f g(x; e, M) аР, (х) -Í g(a; e, M) dF(x) 


Let C be an upper bound for gC); that is, |g(z)| < C for all a. Then 


(5.15) JI <C У [IF (a) — Fla) + Faia) — FG) 
kel 


as a sum of two integrals, one over the range 


Next, we may write /,, 
ge |x| > M. In view of (5.13), we then 


|| < M and the other over the ran 


have 
|] < e CI — РМ) + F,(—™)). 


Similarly 
J| <e + Cll — FM) + F(—M)). 


In view of (5.14), (5.15), and the two preceding inequalities, it follows that 


NCC INCL | 
< 2e +281 — F(M) + FC-M)I 
d then M tend to со, it follows that (5.13) will 


(5.16) lim sup 


Letting first e tend to 0 an M 
hold. The proof that (iv) implies (i) is complete. ; 

The reader may easily verify that (V) is equivalent to the preceding 
Statements. 


THEORETICAL EXERCISES 


i in distribution. 
5.1. Convergence of the means of random variables convergent in distri 
If Z,, NE in distribution to Z, show that for М > 0 such that Fz(') 
is continuous at 2M, as n tends to «о. 


M an 
| z dFz, C) p ағд). 


-M 


438 


5.2. 


SEQUENCES OF RANDOM VARIABLES cH. 10 


From this it does not follow that E[Z,] converges to E[Z]. Hint: Let 
Fz,(2) = 0, 1 — (1/n), 1, depending on whether = <0,0<z< nn <z; 
then E[Z,] = 1 does not tend to E[Z] = 0. But if Z, converges in distri- 
bution to Z and, in addition, E[Z,] exists for all л and 


(5.17) lim lim sup | |z| dFz,(2) = 0, 
M= п—®= а> 


then E[Z,] converges to E[Z]. 


On uniform convergence of distribution functions. Let {Zn} be a sequence 


of random variables converging in distribution to the random variable Z, 
so that, for each real number z, lim Fz,(2) = F(z). Show that if Z is a 
n 


continuous random variable, so th 


at F7(') has no points of discontinuity, 
then the distribution functions con 


verge uniformly; more precisely 
lim supremum |Fz,(2) — F,(2| = 0. 


п-»® —=<г< 


Hint: To any e > 0. 
that Fz(z;) — Е;(г;_ 


‚ choose points =æ% =z, < £ «X... 
)«eforj21,2,... ‚ К. Verify that 
supremum |Fz (z) — F,(z)| = тах [Fz E) — Е.) + e 
= ю<2< 0 je04,,K 


TABLE I 
Area under the Normal Density Function 


1 т 
A table of Ф(@) = —— [| en!" dy 
V20 J-o 


x 
o 
o 
© 
D 
o 
о 
ю 
о 
о 
о 
о 
о 
= 
o 
o 
хл 
o 
o 
a 
o 
о 
з 


ооооо ооооо 


тоо Бо wolowu FWNMEO 
i 
A 
ду 
А 
E 


[e 


mmmmrn NANEN KERB 


соу +оо О Osov тоюн о Osov 
g 
8 
[57 
o 
о 
= 
о 
мол 
33 
Lt 
8 
E 
8 
MY 
о 
$ 
$ 
су 
Га) 
8 
Г] 
m 
$ 
ON 
$ 
А 
RU 


w wwwww 


441 


MODERN PROBABILITY THEORY 


442 


9G10' 6t1o' роо Т100° 9700° pToot Ј000° 2000: тосо” 0000* 0000* 0000° 0000° 9 
860° поро" 6090" 60to* Gogo- 7000" 1900" 9000" 0000° S 
пт" биге" TOQT- egtt- 1660: tego: ceco ОЕЕО" Sto" 6670" гтоо" т000° Оооо: 4 
Sete’ тәте: efOt- colz- ccez- cete: ©СӨТ` BIET? 6190" Сто" 9ңто° тгоо` оооо- t 
12° f52* ogle: ort: ogee’ 262° ThE: 9962" бг” eolv "960° Сосо’ «too г 
: T 

o 

с 

т 


860° «tort: 66tl- SORT” е" бог" Geot: O9Gt: ZEGE” E66tC- tüGt* Tete: 6o 


96то" өЈто" Lizos lowo- "Slo" 9/80" 9Ltt' oglr' tege TUE HATEG? Т68/° с̧түб: 9 
eto" Ерго" Seto: zoro- 8600" тоо` peoo’ отоо: £000" то0о" 000° 0000: 0000* 
29ST" Олт" gett: golo* рото" erqo- 120° ONTO" 900° 2Z00' 000° 0000° оооо: 
GETE: o90f: 16/2" nofe’ ттт 9191: EZET’ 6190" 2TS0° ео” Tg00° TTO0° 0000: E 
Sete’ ТЕ’ 69EE* 96° woft- 2626- 2808" 1t02: groze гост" 62/0: тео" отоо: z 
961° Cot 6602" 2662 wert" гбг: Z09t: GGóf' обот" ST6E* ogee: otoz ogo" pA 
10° Спо" Ғобо" glos oett" LIET Tg9t* ELEZ' 126° бү" 6066" BELL” otG6" о б 
S290" 9/60" ОТО" 9620" oGto: Єгто- 600° 6бЕ00° отоо" 000" Tooo’ 0000" 0000: т 
0062° one’ Goozg* QET” GITT” 6960" 9610° 6950" 9620" STTO* 9£€00° Сосо” 0000° £ 
OGLE” Lyles Glot- ос Gott: £962" ог" бота" ECT: $L60° 9gn0" СЕто" 9000: г 
0062" 0092" 666г" Сне" Gwot- TEGE ЭТТ’ бтаү’ 9604" ©09Е- отбе" GUT goto: т 
6290" 1190" Стбо° обет: Віт" 6/67" Tone’ oTe 960%" 0226" T9S9° Gte” 9096: o f 
OSET: LTT’ Tt60* 090° бгңо` olfo* 0/20: 9STO' 0000" 4£00" отоо" Tono’ o000* € 
OSE’ lof: THEE: 0882" 6gfe* ezze' o6gt' ort: 0960" #1/60° 0/20" T400' £ooo- г 
OGLÉ* £egt£' gone Ocft' Oft арр” orn 6t2t* ота" T6zf* осе" GET 1620" T 
Обет" Јат" ң99т" OTe: 9/2" 962" ОЕнЕ" óT2t' oeTS* тато" obel "i6g' Е0/6' o £ 
0062" Тог" Gzoz* ost: Geet’ ттїт` ообо: S290" 0010" 620° ooto 200° тооо° g 
0006" 666%` оббт" ооб" 066ү” ттт" оог" OGLE’ оог" 0662* OOgt' 06 ai КШ: т 
$z06* т086° 0 


00S2* тооё` с̧гоғ- 009° Ge2t' түү" ооб: Geos" 009° бег) ooTg' 


6`0 = d pue 'oc'o(co'o)ceo “$ ‘OE'OCSo'o)so'0 100 = d 


2 
pue gr *****c*p = ш 20у а «(d — р), „129 199 у 


/ 


soniTiqeqoiq [eruiourg 
II я19ҮІ 


443 


MODERN PROBABILITY THEORY 


1000' 0000' 0000° 0000' 0000: 0000: 0000" o000- 0000" 0000* от 
9100' 6000: £000" Т000° 0000: 0000: 0000" одоо 0000' 0000° 6 
20To' Etoo: o£oo' «too: 000° Tooo 0000° 0000: 0000' 0000* 9 
S240" 220" t9To: 0600: 1£00* Qooo* rooo- 0000° 0000* 000° L 
ттт: 6890" 6060: goto: e9to* C600: етоо- Т000° 0000* 000° 9 
lo02* EGT: 99€T- бгот: "GGO* 4920" Ggoo' STOO: т000° 0000: © 
8062" Lite: 9lze* тооё` okt’ 1980° TONO: ZITO" ОТОО; 0000: t 
OSTZ” гёбё- To9z- Ф992" focz: fTO2* рбет" +/50° Colo: tooo: £ 
богт" LGLT* i66t: GEZ 91gé* 020Е° 66/2" LET" 9rlo' z400° z 
foto" elo: Logo: ttet 97° 892° үле" 088° TOTE: 4160" T 
0900" СЕТО" €L10" egeo: £960" от: 696T° lghf* /066° trob” о OT 
£000" Tooo’ Т000° 0000° 0000° 0000" оооо: 0000" 0000* 0000" 6 
СЕ00° £T00" 6000" 000° Tooo" 0000" 0000" ОООО” 0000" 0000° 8 
2120" Q600' ®/00° 6£€00" гтоо" £000" 0000 000° 0000° 0000: L 
Erlo eho’ THEO’ отго" 4900" gzoo' 9000° 7000° 0000- 000° 9 
el9gv Tgrt' гот SELo- 6gf0° 6970" 0600° 9000" 0000: 0000“ 6 
Bosz’ пбтг" guoz' Gut gor 1990" tgeo' «100: 9000" 0000° t 
806ё° 9Т/@` Tele’ go9z* otte: e9lt' 690r Это" LLo0' то00° t 
?U9U zote’ Tntz* 9992" toot: осоЕ" l6Gz- velt 6290' +00" 2 
$090" тоот" TATT 966т° £6ez- 020Е° 619t: «pt: сб" 080° T 
Toto: Joz0' 0920" tonos 15/0" estt: otez: "igt£' zos: бЕТЄ' о 
1000° 2000" 2000" T000' 0000° 0000: оосо: 0000° 0000" 0000° 8 
6100° ЕЕ00" zoo’ ZTOO' 000° T000' 0000 0000' 0000° 0000" L 
Етно" LTe0' то" OOTO' 800° тт00° 2000° 0000' 000° 0000" 9 
6EZT' gogo’ Е890° Loro’ Т620° гбоо" zoo" #000" 9000" 0000" 6 
ecte: GLAT” olt TIET <980° 6650' Сето” Oo«0o: 000° 0000* " 
18!@` 981° TELZ” тусе" 9/02: gort 6tgo: Ttto' 4600" tooo: t 
o6oz' Lose: ttlz* G96z- Cte: 9t6z- olte- 817° 610° 9200" 2 
9690" ELET" TOGT” LL6T' ol9e: GGtt' Lage: 9egt' t6le' ouo: ї 
9910' 61to: o6to: ©/©о° тоот: glst’ Gelz' Cote «too: lees: [] 
100° 9000" [000° 2000: Tooo’ 0000 000° o000: 0000" 0000* L 
ZLTO* ngoo’ 900° foo too «0o0* 1000: 9990" 0000' ^000* 9 
nllo’ Sono" w8t0* 060° Стто: £00" gto г000° 0000° 0000° G 
GEGT: гтүт” oget- el60: Ligo 1820° 60T0" 92n0* z20no: nano: " 
£062" 6/92" тсбг" боге" otl LyTT’ 2190° ofzo обоо: oono: £ 
ttce* Gg6z' tlot: LITE: Gut tGle* 1602" оңгт" соно" ozon: ё 
COET’ ӨЧӨТ” тог" т/қе" SITE: nic" AGEE: ele: Есе" беп: T 
0820° 0640" 6860" zgo GEET J60z- coze: ely 969° тгїб: о 


(panujwod) yp ATAVL. 


MODERN PROBABILITY THEORY 


444 


6000* 


2000* 


TO00* 


OICDP(ZOYz(I'O) LO = v 10у iz/.y,. ә Jo aqe) у 
SenmmqeqoJq 0055104 
Ш XISVIL 


о-о DO 
Adda 


AN mar as 
Adda 


445 


8560* 
9eLo' 
Tgnuo* 
4920" 
£TTo' 
*£o0* 


9000* 
1000" 
£000* 
2000" 
то0о" 


MODERN PROBABILITY THEORY 


то0о’ 
0000" 
0000* 


тооо', 


LETT: 
0460* 
гг/о" 
zs 
Gzzo' 
2900* 


6t00* 
тоо” 
6000* 
9000* 
4000" 


2000* 
тооо' 
Tooo* 
0000* 
0000* 


2000* 


TGT 
ggtt* 
£660* 
otLo* 
Єт” 
Tgto' 


£G00* 
6t00* 
92oo* 
6too* 
Етоо’ 


8000” 
G000* 
£000* 
gooo" 
T000" 


1000* 


Tooo* 


TGel- 
өтет: 
Tüel* 
"lot" 
8890” 
£9£0° 


гето" 
гото: 
9200" 
9600: 
оңоо” 


L200* 
gtoo* 
ттоо” 
000° 
000° 


6T00° 
9000* 
2000* 


To60* 
uu 
96Ет: 
Обт” 
Lit 
чот” 


G6Go* 
goso" 
Szto’ 
greo" 
8/20" 


9тго', 
£9to* 
Өтто” 
t9g00* 
SS00" 


(panunuo)) I AIAVL 


1£90* 
тїбо' 
TZT 
O6hT* 
909т: 
eon 


гңот: 
9t60* 
9ego' 
9TLo* 
090° 


1060" 
Loro’ 
61£o* 
Tio" 
хо 


U00* 
6200* 
6000* 
2000* 


980° 
1050: 
9160" 
Liegt 
9C9T* 
SGLT' 


E9GT' 
List: 
LLET’ 
HIT 
отт: 


800т' 
©1890” 
[437 
2090* 
9250" 


вето" 
8600” 
Tzo0" 
$n00° 
To00" 


бето” 
280: 
£160* 
e160* 
6ЕЄт: 
66/7: 


nS6T" 
тубт" 
гт6т: 
gegr 
TgLt- 


og9t* 
1GGt* 
"SU 
HST: 
egot* 


Lte0* 
бото: 


тоо” 
£000* 


TeSo° 
ёо” 
6cto* 
too" 
2200" 


62L0° 
hoso’ 
c620* 
euto* 
2600* 


ВМиК“ 2 


4.1. 


4.3. 


5.5. 


5.7. 


5,9. 


Answers to 
Odd-numbered Exercises 


CHAPTER 1 


S = {(D, D, D), CD. D. G), (D, G, D). (D, G, G), (G, D, D), (G, D, G), 

(G, G, D), (G, G, G)). А, = {(D, D, D), (D. D, G), (D, G, D), (D, G, O}, 

Ads = {(D, D, D); (D, D, G)}, Ar U A: = ((D, D, D), (D, D, G), (D, G. D), 
(D, G, G), (G, D, D), (G, D, G)j. 

3}; (ii), (viii) {1, 2, 3, 7, 8, 9); Gi), (iv), (Wii), Qiii) {10, 11, 12}; 


(i), (xvi) {1, 2. 
12}; (іх), (xii), {4,5,6}; (xi) (1,2,3,4, 5, 6, 


(v), (vi), (xiv) | 1,2, 3, 7, 8,9, 10, 11, 
7,8,9); (х). (xv) S. 

(i) £10, 11, 12]; Gi) [1, 2, 3); Gil) (4, 5, 6, 7, 8, 9); (v) #; (0) 5; 
(vi) 11, 2, 3, 4, 5, 6, 7, 8, 9}; (vii) (4, 5, 6, 7, 8, 9}; (viii) ф; (ix) (I0, 11, 125 
(x) 1,2,3, 10, 11, 12); (xi) 5; Gil) S. 


P [exactly 0] — 1 — P[AB] — P[A] — Р[В). P [exactly 1] = P[A] + P[B] — 2P[AB]. 
P [exactly 2] — P[AB]. Р [at least 0] = 1. P [at least 1] = Р[А] + P[B] — P[AB]. 
P [at least 2] = Р[АВ]. P [at most 0] = 1 + P[4B] — P[4] — РІВ). 

P [at most 1] = 1 — P[AB]. Р [at most 2] 2 1. 


(i) i (ii) 3 (iii) 4 
1 a 2 
з о 3 
g E 0 
1 1 1 
3 5 5 
5 5 0 
1 Е i 
в 5 1 
1 1 1 

400. N [exactly 2] = 100. 


= 400. N [exactly 1] = 
— 900. N [at least 1] — 500. N [at least 2] — 100. 
0. N [at most 1] = 800. N [at most 2] — 900. 


447 


N [exactly 0] 
N [at least 0] 
N [at most 0] — 40 


448 MODERN PROBABILITY THEORY 


5.11. Let M, W, and C denote, respectively, a set of college graduates, males and 
married persons. Show ММ U W U C] = 1057 > 1000. 


73. 12/21. 
73. (G) 0.14, (ii) 0.07. 


7.5. 3. 


CHAPTER 2 
1.1. 450. 


13. 10,32 

1.5. 10. 

1.7. (i) 70; (ii) 2. 
19. n= 18,r = 10. 
1.11. 204, 54, 108, 98. 
1.13. 2205. 


24. Without replacement, (i) ,4, (ii) 12, (iii) 28; with replacement, (i) #3, (ii) #4, 
(iii) $5. 

23. k 2,12 3,11 4,10 5,9 6,8 7 
with replacement ъ & RÀ A д S 
without replacement 0 w 5А 4$ & vo 


2.5. 0.026, (ss) 

2.7. 0.753. 

2.9. PM, = 0.223. 

241. (i) $; (ii) 35; (iii) д. 

2.13. (i) 0; (ii) 12. 

3.1. With replacement (i) 2, (ii) 15, (iii) 2, (iv) 3; 
(ii) 21, (iii) 24, (iv) 32. 

3.3. (i), (ii) 2-10; (iii) 

3:58 


without replacement (i) i, 


La 
ue 


3.7. (45)5/(50)5. 
3.9. 0.1. 


ә 


3.11. Manufacturer would prefer plan (a), 
3.13. (900);/(1000); = (0.9)5 = 0.59, 
815. Ж. 


consumer would prefer plan (b). 


ANSWERS TO ODD-NUMBERED EXERCISES 449 
43. (i), Gi) 55; Gil) fs. 
4.5. (i) False, since Р[АВ] = 1; (ii) false; (iii) true; (iv) false. 
49. 42. 
4.11. (i) 3; Gi) 3; Gi, (iv) 55 (у) $; (vi) 0; (vii) 35 (viii) undefined. 
4.13. 3, 3. 


s ose (2.0 
(6-649 0-8 


5.5. ta) =a) - a (9 
20-0 

«x о (3/09: c (9/09): 9909/09. 

е ($)/* G) (зза) 8" в Xc»() (1-3) 


65. (i) PIB) = 1 — Si + 5 PUB = S, — 25ь РІВ) = 5. 
(ii) РІВ) = 1 — Si + $: — 5 РІВ) = Si — 25 + 35» PLB] = 5 = абу, 
P[Bal = Ss. (iii) РІВ] = 1—5. + 5, — Ss + Sa PUB] = S, — 2S: + 353 — 455, 
P[B.] = 5 — 35s + 65, РІВ] = S, — 45+, РІВІ = Se P [at least 1] = 
S See 4 Sy. P [at least 2] = Sz — 252 388—777 + MSy. 

P [at least 3] = 5 — 35:77 + 1M(M — DS. P fat least M] = бу 


СНАРТЕК 3 


1.1. Yes, since P[AB] — (4)? and P[4] = P[B] = $ 
(No, since P[AB] = 2 and P[A] = РІВ) = 2." 

1.3. No. 

1.5. (i) 0.729; (ii) 0.271; 

1.9. Possible values for (P[A 

(3) Е; (4) Т; (5) Е; (6) T. 


(iii) 0.028; (iv) 0.001. 
], РІВ) аге (3, 3) and (1, D. 


21. (07; 2) F; 
23. (i) $83 00,00 si 

зл. (i) 0.240; Gi) 0.260; C 
3.3. (i) 0.328; (i) 0.410; (iii) 0.262. 


iii) 0.942; (iv) 0.932. 


3.5. (i) 0.133; (ii) 0.072. 
эл. (i) 0.1973 (ii) 0.803; (iii) 
3.9. Choose” such that (0.90)" < 0.01; 


341. G) (1 0) = 0.881; (ii) (1 — Joo)” + 540 
(i) (1 — qe — Ф) = 0.846; (iv) (1 — oo) "dos 


0.544. 

therefore, choose л = 44. 
(o Фо)* = 0.994; 
= 0.035. 


450 MODERN PROBABILITY THEORY 
3.13. (i), (ii) 0.2456; (iii) 0.4096. 

3.15. 5"-1/6^. 

347 $ 

3.19. 0.010. 

3.21. (i) 0.1755; (ii) 0.5595, 

3.23. (i) 0.0256; (ii) 0.0081; (iii) 0.0576, 

3.25. (i) 0.03760; (ii) 9. 

3.27. 0.379. 

3.29. (i) 1; (ii) 4 or 5, 

41. (i) 23; (ii) 4, 

43. 
4.5. 


© T; (ii) F; (iii) T; (iv) T; (v) F; (vi) F. 


сөз 


> Ог comes from the Far West be denoted 
Then P[B| A] = £2, PIC] 4] = $, PID| A] = à. 
4.11. (i) 4; (ii) box A, 2; box B,0; box C, L 


4.13. $. 


» respectively by А, B, C, D. 


415. (i) A; (i) A; (iii) dh. 
4.17. (i) V. (ii) 32; Gii) 0.24. 
51. Ps, f) = 23, Pf, зу = 21. 


57. Фа ФІ — 2p; Gi) 


*30 = 2s Gili) Lit p <1. 
59. i 


6.1. (i), (iii) P, — P, = P; 


ро E10 
(ii) я | 4 of, P=]; 4 of; 

$44 каз 
(iv) Р, =P, = 


pue 


6.3. (i), (iii) z, = LEE (ii) =, = 2 


Fi = 1 
35779 = $, 


ANSWERS TO ODD-NUMBERED EXERCISES 451 
65, P has rows (р, д, 0, 0), (0, 0, р, 4), (р, 9> 0, 0), (0, 0, p, q). For n > 1 the rows 
аге (p^. pq, py, 4). | 
6.7. $ifpeq-b(q — pq — pP) ifp #4. 
6.9. 1. 


CHAPTER 4 


P [exactly 2] = }. P [at least 0] =1. 


1. [exactly 1 
P [at most 0] = 1. Р [at most 1] = $. 


1.1. P [exactly 0] = Р j= 
1] = $. P [at least 2] = 


P fat least 
P [at most 2] 


2.9. (i) A = 1; (iii) (a) 0.1353, (b) 0.6321, (c) 0.2326; (iv) P[A(b)] = ет". 
2.41. (i) A = 3; Gili) (a) (44% © 1, (©) 4: (v) PIA] = Q^. 


1. 
1. 


3.9. (ii) fe) = A e=t150°; (iii) (а), (b) 0.184, (c) 0.632, (d) 0; (iv) (а) 1 — e^ 
(b) (e? — e — ет). 

3.11. (ii) f (2) =: forO<a#<1; =}for2<x<4; =0 otherwise; (iii) (а) 4, 
(b) 3, (с) 4; Чу) @ 4, 0) 3. 

= 200, n = 20, p = 0.05; (ii) binomial 


4.1. (i) Hypergeometric with parameters N 
geometric with parameter p — 0.51; 


with parameters л = 30, p = 0.51; (ш) 
(iv) binomial with parameters л = 35,р= 0.75. 


43. pæ) -oo Гога = 0, 1, +*+, 6; 0 otherwise. 


45. р) = (— "t 2) / (5j for æ = 2, +++, 12; 0 otherwise. 


fora = 1, 2, +++; 0 otherwise. 


47. pe) = GG) 
49. p(x) = (= — DERG 
51. kk 


for x —2,3,::: ; 0 otherwise. 


5.3. (i) 49; (i) 355 (i ss. 
1 
5.5. P[r < z] = Р[ап(—0) < ELSE ‚лап г. 


з 


zx em. 


53. ()P[Q.3 < V < 0.4] = 0.07; Gi) PL Ina <3 
0.05 010 0.50 090 095 0.99 


6.1. x 
() Ја) 1.645 1.282 0000 —1282 —1.645 —2.326 
K(x) 0.063 0.126 0.6755 1.645 1.960 2.576 
(ii) J(x) 3.290 2.564 0.000 —2.564 —3.290 —4.652 
1350 3.200 3.920 5.152 


K(a) 0.126 0.252 


63. 0.512. 
6.5. (i), (ii) 0.2866; (iii) 0.0456. 


452 MODERN PROBABILITY THEORY 


=Ë -2—1 
вл. He =0(5 )-*( ) 


2 
7.3. (i) 7/16; (ii) 7/64. 


CHAPTER 5 


11. (i) 13; (ii) 24.4; (iii) 63.6; (iv) 0; (v) 4.4. 

1.3. (i) 1010; (ii) 9100; (iii) 63,600; (iv) 0; (v) 840. 

2.1. Mean (i) 3, (ii) 0, (iii) fs; variance (i) ^, (ii) 4, (iii) 53. 
2.3. Mean (i) does not exist, (ii) 0, (iii) 0; variance (i) does not exist, (ii) 3, (iii) 1. 
2.5. Mean (i) §, (ii) 4, (iii) 4; variance (i) $, (ii) 4, (iii) S. + 
2.7. Mean (i) $, (ii) 1; variance (D. i). А. 

2.9. (i)r >2; (i)r > 3, 

34. (i) 1/1 — t); (ii) bya — 1). 

3.3. (i) 2е'(3 — е); (ii) e-o, 

3.5. (i)1,1, 1,4; (ii) 1, 1, 1,3. 

4. 250. 


43. (01 — (0) 20.9375; (i) 1 — GH = 1, Chebyshev bound 0.75. 


ST. Chebyshev bound, (i): (a) 50,000, (5) 500; (ii) (a) 250,000, (b) 2500. Normal 
approximation, (1): (a) 9600, (5) 96; (ii) (a) 16,600, (b) 166. 


5.3. Chebyshev bound, (i) 8000; (ii) 12,500. Normal approximation, (i) 1537; 
(ii) 2400. 
9 D 21-е О ЖЕТ 
6.1. Bf, e. ee а / 1 EN. = ele); 
(i) mg. Tey 3i-ent5 Gi) megs, - 1 1 Tu 
(iii) mean £, variance оо 


‚ T.8.f. does not exist, (iv) mean cc; variance, m.g.f. 
does not exist, 


CHAPTER 6 


2.1. (i) (4)0.003; (b) 0.008; (ii) (a) 0.068; (b) 0.695. 
2.5. (1) 0.506; (ii) 0.532. 
2.7. (i) 423; (ii) 289, 


5 z 
2.9. Choose п so that (i), (ii) p (s) = »(* са 28) < 0.05; 
V24 V24 


tats) 2Vn— 9.8 
iii) Ф EU ELS - A a upper 
Gii) ( УЭ] T < 0.05. One may obtain an upp 


bound for n: (i), (ii) (1.645 /23 + 9.8) = 319; (iii) 11.645 V2] + 9.8)? = 75. 


ANSWERS TO ODD-NUMBERED EXERCISES 453 
2.11. (i) 0.983; (ii) 0.979. 
3.1. 0.671, 0.000. 
3.3. 0.8008. 


3.5. (1) 0.109; (ii) 0.968. 


3.7. (i) 0.632; (ii) not surprising, since the number of 1 minute intervals in an hour 
in which either no one enters or 2 or more enter obeys a binomial probability law 


with mean 18.9 and variance 6.96. 
3.9. (i) 0.1353; (ii) 0.3233. 
3.11. 15. 


4.1. T — 10 hours. 
43. N — r obeys a negative binomial probability law with parameters p — i and 


G) r = 1, (ü) r = 2, Gi) r = 3. 


4.5. (1) 0.0067; (ii) 0.0404. 
ал. 1 — (1 — p)"; п = log (0.9)/log (0.9999) = 1054. 


4.9. (1) 0.368; (ii) 0.865; (iii) 0.383. 


CHAPTER 7 


af 48 Be for z 20,1,:::,4; = 0 otherwise. 
Bike pa iin = J/G d 


2 +, 6; = 0 otherwise; 
2.3. Without replacement рх(®) = зо ‚6; 0 othe: , 


@ = 1) for x =1, 2, °° 


же, = +++, 6; = 0 otherwise. 
with replacement px) = ЗЕТ forz = 1,2, 16: 
forz20,1,:::,9; =0 otherwise. 
2.5. руш = ET 
2 Ses ) = 1,2,+°°,40; =0 otherwise. 
т 0 13(, = )/e: а dl, EST fona 


52 fi = 1,2,”**,49; = 0 otherwise. 
a )/6 - C2) ый 


2.9. 0.3413. 
2.11. 0.5811. 
2.13. 
2.15. 
31. 


ve xe оз 


454 MODERN PROBABILITY THEORY 


ал. fry) = (y — 5)°/125 if O<y<5 


= (025 — у)125 if —5<у<о 


0 otherwise. 
i), Gi (E1, 22,23) P3, x3, xy 315 Va, 23). 

ы, (0, 0, 0) = Gy 
(1, 0, 0), (0, 1, 0), (0, 0, 1) 16» 
@, 1, 0), (1, 0, 1), (0, 1, 1) [OE 

(1, 1, 1) () 

otherwise 0; 

without, (1, 0, 0), (0, 1, 0), (0, 0, 1) E 

otherwise 0. 


i ЫТА ify = 0, 1, 2,3; = 0 otherwise; 
5.3. With, py,(y) = M Al 5) ify =0, 1,2,3; = d 
Pr) =G ify= 0; =1— 8} 
Pr) =1— 0 ify =0; = Gy 
without, p, (1) = pr.(1) = py,(0) = 1. 
$5. (а) (i) }, (ii) 3, Gii) 0; (b) (i) 3, (ii) €^, (iii) 0. 
5.7. Yes. 
5.9. 


ify =1; =0 otherwise; 
ify = 1; = 0 otherwise; 


(а) (1) $, ii) 4, (iii) д; (5) (i) 3%, (ii) 3, (йй) 2. 
61. 1—(1— e=), 


63. (i) Yes; (ii) Yes; (iii) yes; (iv) | — e=? 
1 


Von?” for y » 0; 
Vry 


3 (V) yes; (vi) 0.8426; (vii) /л=00) = 


— gom for 
27У uv 


=0 otherwise; (viii) Sxe, you, v) = 


!5 v > 0; = 0 otherwise; (ix) yes; (x) no. 


6.5. (i) True; (ii) false; (iii) true; (iv) false; (v) false, 


6.7. (i) 0.125; (ii) 0.875, 
6.9. (i) 0.393; qi) 1 — 
72. + 


In 2 = 0.307; (ii) 2, 


73. (i) 0; (ii) (2); (iii) G)’, 

75. (i) 1 + In 4): (ii) o, 

TUS ь 

79. (01—(06»; (т 00.4)"; (iii) (0,4) + л(0.6)(0.4)"-1, 

sa: Лер = "d €T for x > 0; = 0 otherwise 
т (kT) Б Е 


# distribution with Parameters n = 3 and o = (туж 


1 ә 
8.5. = (1 — x) for l| «1; 20 otherwise, 


ANSWERS TO ODD-NUMBERED EXERCISES 455 


= 1 
8.7. (yoV 22) Е sd (log y — »| for 2 0; = 0 otherwise. 
ul 
8.9. (a): (i) a for 1 -< y < e; = 0 otherwise; (ii) 5, for e < у < е; = 0 otherwise; 
1 2y 


(b) e~ for y > 0; = 0 otherwise. 


8.11. (а): (i) 3 for 1 <y <3; = 0 otherwise; (ii) } for —1 <y <3; =0 


. t fy —1)* 
otherwise; 0 e for 1 <y <3; = 0 otherwise. 


„n 4y А к à 
8.13. ( —— e! 3 for y > 0, 0 otherwise; (ii) —— e~!2¥° for y > 0, 0 otherwise. 
У 27 V20 


x 
8.15. (i) [2z*(1 — y) pA e 5 where y = sin zz, for |y| < 1; = 0 otherwise; 


— 0 otherwise. 


m 1 an? - 
(ii) —— sec? ye ! i for |y| < 
М2т 
1 


1 
8.17. (а) —= for 0 <y < 1; 0 otherwise; (6) — 
2Vy oN 27y 


71/25? for y > 0; 0 otherwise; 


»—wl29* for y > 0; 0 otherwise. 


8.19. Distribution function Fy(»): 


m 1 а 
(а) О Гог <0; fore = 0; = Гого <x < 1; Hforz > 1; (0) О fore < 0; 


4 for a = 0; (3) forx > 0; (с) O fore <0; 1 — e828" for a > 0. 


9.1. 0.9772. 


9.3. (1) 2y¥,0<y < 1; 0 otherwise. (ii) (1 — y), 0 < Y < 1; O otherwise; 


EM 
9.5. (i), (ii) Normal with mean 0, variance 20°; (iii) E e 7 for y > 0; 


O otherwise; (iv), (V) normal with mean 0, variance 20°. 


9.7. {ra + Dj. 
1: (ii) exponential with 2 = 2: 
> 0; O otherwise. 


arameters г = 3 and 4 


9.9. (i) Ga a with 2 6 
legi tad > 0; О otherwise; (iv) CI + 0) 72 fory > 


(iii) 2e" *(1 — e "^ fory 


у ys 


9.11. n— c7 el — g I2 yii for y > 0. 
gi 


9.17. 1 — n(0.8)— + (n — 1)(0.8)"- 


9.19. Sce the answer to exercise 10.3. 


9.21. 312/(1 + u)* for u > 0; 0 otherwise. 


10.1. (i) 1e 12% ifn > 0, |} Ж; 
and y, 27 0; 0 otherwise. 


0 otherwise; (ii) дет) if 0 X ys <V 


456 


MODERN PROBABILITY THEORY 


i З ise: 7 sd 
10.3. fro, X) =rifO —rcosz,rsinx <1;0 otherwise; fj(r) = 5 rfor0 <r < 


11.1. 
11.3. 


11.5. 
11.7. 


1.1. 
1.3. 
1.5. 
1.7. 
1.9. 
1.11. 


1.13. 


1.15. 


2.1. 
2.3. 


2.5. 
2.7. 
2.9. 
3.1. 


3.3. 
3.5. 
37. 


à 3 : gr. 
(2 esc r— 3) for 1 <r < У2;0 otherwise; fọ(0) = 1 sec? 0 for 0 <0 < 7 


т 
+ сѕс? 0 for? <0 EL 


; 0 otherwise. 
(i) 1; Gi), (їй), (iv) 4. 
(i) 0.865; (ii) 0.632; (iii) 0.368; (iv) 0.5. 


(i) 0.276; (ii) 0.5; (iii) 0.2; (iv) 0.5, (v) 2Ф(0/2). 
(i) 0.28; (ii) 0.61. 


CHAPTER 8 


Mean, 3; variance, 4. 

Mean winnings, 20 dollars, 

Mean, 1 dollar 60. cents; variance, 18 cents?, 

Mean, 58.75 cents; variance, 26 cents?, 

(i) Mean, 5.81, variance, 1.03; (ii) mean, 5.50, variance, 1.11. 


With replacement, mean, 4.19, variance, 


0.92; without replacement, mean, 4.5, 
variance 0.45, 


Mean, У=[2; variance, 2 — (7/2). 


= 2 п+3 
E[v"] = f» v dE. В 
Mean, 3; variance 3, covariances — 2. 
fr) = 20 — у) foro < Y <1; EY] 4, Var(Y] — A, Бы, 
E[Y'] = 3. 
(= p)in)¥4. 


Means, 1; variances, 0.5; covariance, 2а — 0.5. 


Means, 4; Variances, 6; covariance, бе” (2-41), 


EIX] = 4, ЕГУ] = 4, Var [Х]= 
uncorrelated but not independent, 


У2/3. 


À, Var [Y] = 4, ply, У]= 0; X and Y are 


(0° — T)? + 0,3). 


4a — 1. 


3.0. 
4.1. 
4.5. 
5.1. 
53. 
5:5, 
5.7. 
5.9. 
5.11. 
6.1. 
6.3. 
14. 
7.3. 


2.1. 
3.3, 
3.5. 
4.5. 


ANSWERS TO ODD-NUMBERED EXERCISES 457 


g 0272), 


0.8413. 

E[L] = 150. (i) Var [L] = 16; (ii) Var [L] = 25.6. 
(i) throws more doubt than (ii). 

62. 

25 or more. 

0.70; 7.4. 

38. 

1 = 0.10. 

(i) n > 1537; (ii) n = 385; (iii) n = 16. 
Elv]Jo[v] = 10° 

(i) 0.803; (ii) —O.6x2; Gii) è; (iv) Фаз + das; (v) 0.35; (vi) 0.36. 


(i) Var [Y] = 0.5; (ii) 0. 


CHAPTER 9 


O g gerys; G) e" 7; Gi) ее ЈА — 169; Q0 айк a quac 
2 — y? ЦИ for |y] < 1; $- 2|y| + y? — èll? for 1 < |y| < 2; 0 otherwise. 
(Gr [4a,2a;? — 2] 7*5 for |v — а? — а,2| < 2а,аз; 0 otherwise. 
(i) kth cumulant of S is пк — 1)!(1 + Кт); (i) v = (1 + m?)?/1 + 2n?, 


а = (1 + 2nP)f( + mr). 


Index 


Absolutely continuous probability law, 
402 

Absorbing Markov chain, 143 

Absorbing state, 143 

Acceptance sampling, 52, 55 

Accidents, Poisson distribution of, 252 

Aitchison, J., 315 

Anderson, T. W., 314 

Arithmetic mean, 200 

Average, 200 

ensemble average (= expectation), 

204 


Axioms of probability theory, 18, 150 
Bartlett, M. S., 31 
Barton, D. E., 78 
Bayes's theorem, 119 
Bernoulli, J., 229 
229 


Bernoulli law of large numbers, 
Bernoulli probability law, 178 
moments, 218 
Bernoulli trials, 
100 
Markov dependent, 128 
Bertrand's paradox, 302 
Beta function, 165 
Beta probability law. 182 
Bharucha-Reid, A. Т.. 128 
Binomial coefficients, 37 


independent repeated, 


Binomial distribution function, tables 
available, 245 

Binomial probabilities, behavior of, 109 
table of, 442 

Binomial probability law, 53, 92, 102, 

198 

as conditional distribution, 341 
moments, 218 
normal approximation, 231, 235 
Poisson approximation, 105, 245 

Binomial theorem, 37 

Birth and death process, 264 

Birthday problem, 46 

Bohm, D., 30 

Boole’s inequality, 21 

Borel, E., 30 

Borel function, 151 

Borel set, 150 

Born, M., 30, 375 

Bortkewitz, L., 255 

Bose-Einstein statistics, 71 

Box, G. E. P., 334 

Bridge, 40, 73, 119 

Brown, J. A. C., 315 

Brownian motion, 374 

Buffon's needle problem, 307 


Calculus of probability density func- 
tions, 311, 330 


459 


460 


Carnap, R., 29 
Cauchy probability law, 180 
Cauchy-Schwarz inequality, 364 
Causes, probability of, 114 
Central limit theorem, 238, 372, 430 
Certain event, 12 
Chance phenomenon, 2 
Chandrasekhar, S., 394 
Characteristic functions, 395 
continuity theorem for, 425 
expansion of, 426 
inversion of, 400 
table of, 219, 221 
Chebyshev's inequality, 226 
generalizations of, 228 
for random variables, 352 
Chi (x) distribution, 181, 314, 326 
Chi-square (x?) distribution, 181, 314, 
324, 325, 382 
generating random sample of, 328 
Clarke, R. D., 260 
Coefficient of variation, 379 
Coincidences, see Matching problem 
Combinatorial analysis, basic principle, 
3 


Combinatorial Product event, 96 
Combinatorial Product set, 286 
Combinatorial Product space, 96 
Competition problem, 248 
Complement of àn event, 13 
Conditional distribution function, 338 
Conditional expectation, 384 
Conditional Probability, 60, 62, 335 
density function, 339 
Continuity theorem of 
theory, 425 
Continuous distribution function, 
Continuous probability law, 177 
Continuous random Variable, 196 
Continuous random variables, jointly, 
288 
Convergence, in distribution, 424 
in probability, 415 
with probability one, 414 
in quadratic mean, 415 
Convolution, of distribution functions, 
404 
of probability density functions, 317 
Correlation coefficient, 362 


Probability 


169 


INDEX 


Coupon collecting, 79, 84, 85, 368 
Covariance, 356 

Cramér, H., 31, 365, 369, 392 
Cumulants, 399, 407 


Darling, D. A., 314 
Davenport, W. B., Jr., 255 
De Moivre, A., 78, 372 
De Moivre-Laplace limit theorem, 239 
De Morgan's laws, 16 Д 
Density function, see Probability density 
function 
Dependent events, 88 
Dependent random variables, 295 
Dependent trials, 95, 113 
Difference equations, 125, 130 
Discrete distribution function, 167 
Discrete Probability law, 177, 196 
Discrete random variable, 272 
Discrete random variables, jointly, 287 
Distribution function, 167 
absolutely continuous, 174 
conditions characterizing a, 173 
continuous, 169 
discrete, 167 
joint, 286 
marginal, 287 
mixed, 170 
of a random variable, 272 
singular continuous, 174 
Doob, J. L., 30 
Doubly stochastic matrix, 141 
Duration of games, 104, 346 


Eddington's п liars problem, 133 
Einstein, A., 374 
Empty set, 15 
Equally likely descriptions, 25 
Ergodic Markov chain, 139 
Events, 12 
combinatorial product, 96 
depending on a trial, 94 
equality of, 13, 14 
independent and dependent, 87 
single member, 23 
Expectation, 82 
conditional, 384 
of a function with respect to a proba- 
bility law, 203, 233 


INDEX 


Expectation, of products, 361 
properties of, 206 
of a random variable, 343 
of sums, 366 

Exponential probability law, 180, 260 
characterization by a functional equa- 

tion, 262 

moments, 220 


F distribution, 182, 326 
mean and variance, 380 
Factorial moments, 223 
Factorials, 35 
gamma function, 162 
Stirling's formula, 163 
Feller, W., 31, 109, 128, 133, 245, 265, 
350 
Fermi-Dirac statistics, 71 
Finite sample description space, 23 
Finite set, 10 
Fisher, В. A., 103 
Fourier transform, 400, 403 
Freeman, J. J., 383 
Fry, T. С., 31, 352 
Function, 18, 269 
Functions of random variables, 308-334 


Galton’s quincunx, 250, 377 

Gambler’s ruin, 144 

Gamma function, 162 
Stirling’s formula, 163 

Gamma probability law, 180, 260 
moments, 220 

Gaussian distribution, see Normal 

probability law 

Geometric probability law, 179, 260 
moments, 218 

Geometrical method, 323 

Geometrical probability, 300 

Gibbs’s canonical distribution, 324, 382 

Gnedenko, B. V., 30, 430 

Godwin, H. J., 228 

Gumbel, E. J., 292 


Huyghens, C., 28 
Hypergeometric probability 
binomial approximation, 54 


law, 179 


moments, 218 
Poisson approximation, 251 


461 


Image interference distribution, 405 
Impossible event, 15 
Independence, conditions for, 280, 294, 
364 
Independent events, 87 
Independent families of events, 91 
Independent random phenomena, 280 
Independent random variables, 294 
Independent trials, 95 
Indicator function, 81, 395 
Infinity, 10 
Integral, Lebesgue, 151 
Riemann, 151 
Stieltjes, 233 
Interquartile range, 214 
Intersection of events, 13 
Interval, 149 
Inventory problem, 257 
Irwin, J. O., 80, 306 


Jeffreys, H., 29 
Jensen’s inequality, 434 


Kac, M., 30 

Kemeny, J. G., 128 

Kendall, M. G., 29, 219, 379 
Kolmogorov, A. N., 30, 431 


Laplace, P. S., 25, 29 
Laplace distribution, 354, 398 
Laplace’s “equal likelihood” definition 
of the probability of a random 
event, 25 
Laplace’s rule of succession, 121 
Law of large numbers, 371 
Bernoulli, 229 
quadratic mean, 419 
strong, 420 
weak, 429 
Lévy, P., 30 
Limit, see Convergence 
Lindeberg’s condition (for central limit 
theorem), 433 
Loéve, M., 30, 80, 418, 431 
Lognormal distribution, 315, 
Lukacs, E., 397 
Lyapunov's condition (fo 
theorem), 432 


348 


r central limit 


462 


Mallows, C. L., 228 
Marginal distribution, 287 
Markov chains, 136 . 
Markov dependent Bernoulli trials, 128 
Markov's condition (for law of large 
numbers), 418 
Matching problem, 48, 77, 85 
moments, 224, 369 
Poisson approximation, 258 
Mathematical induction, 20 
Mathematical model, 6 
Matrix, stochastic, 140 
transition probability, 138 
Maxwell-Boltzmann law of velocities, 
237, 354 
Maxwell-Boltzmann statistics, 71 
Maxwell distribution, 181 
Mean, of a probability law, 204, 211 
geometrical interpretation, 211 
of a random variable, 346 
Mean square, of а probability law, 205 
of a random variable, 346 
Means, table of, 218, 220, 380 
Measurement signal to noise ratio, 379 
Median of a Probability law, 213 
Mises, R. von, 30, 417 
Mode of a Probability law, 213 
Molina, E. C., 247, 257 
Moment generating function, joint, 357 
ofa probability law, 215, 223 
of a random variable, 346 
Moments, central, 205, 212 
joint, 356 
joint central, 356 
of a probability law, 205, 212 
geometrical interpretation, 212 
of a random variable, 346 
Moran, P. A. P., 109 
Moses, L. E., 115 
Muller, M. E., 334 
Multinomial coefficients, 40 
law, 108 
theorem, 40 
Multinomial Probabilities, beh 
109 


Mutually exclusive events, 15, 88 


avior of, 


Negative binomial Probability Jaw, 179 
moments, 218 


INDEX 


Neyman, J., 103, 243, 340, 388 
N(m, с?), 348 
Noise, 254, 321, 360, 383 
Normal approximation, to binomial 
probability law, 231, 239 
to Poisson law, 248 
Normal density function, 188 
Normal distribution function, 188 
Normal probability law, 180 
moments, 220 
table, 441 
Normally distributed random variable, 
274 
generating random sample of a, 334 
n-tuple, 32 


Occupancy problem, 69, 79, 84, 368 
table of solutions, 84. 

Odd man out, 104, 346 

Orthogonal random variables, 419 


Pareto's distribution, 211 
Partition of a set, 39 
Partitioned samples, 71 
Pascal's triangle, 38 
Poisson, S. D., 255, 417 
Poisson probabilities, behavior of, 109 
Poisson probability law, 105, 178 
approximation to binomial, 105, 245 
and stochastic processes, 252, 267 
table, 444 
Polya, G., 372 
Prediction, 49, 51, 103, 112 
minimum mean square error linear, 
386 
Probability, as a function, 18 
axioms of, 18, 100 
conditional, 60 
density function, 151 
joint, 194 
physical representation, 157 
integral transformation, 313 : 
Laplacean "equal likelihood" defini- 
tion of, 25 
law, 177 
mass function, 155, 160 
of occurrence of a given number of 
events, 76 


INDEX 


Probability, prior and posterior, 120 
of a random event, 2, 18 
theory, 6 

Probabilizable set, 149 


Random division of an interval or cir- 
cle, 306 
Random event, 2, 12 
Random phenomenon, 2 
numerical valued, 148 
n-tuple, 193, 234, 276 
Random sample, 299 
Random sine wave, 311 
Random telegraph signal, 360 
Random variables, 170, 269 
continuous, 272, 274 
discrete, 272 
functions of, 308-334 
independent, 294 
jointly distributed, 285 
Random walk, 141, 143, 375, 393 
Range of а random sample. 322 
Rayleigh probability law. 181 
Real line, 149 
Reliability, 374 
Repeated trials, 94, 96 
random variable representation, 298 
Rice, S. O., 321 
Robbins, H., 163 
Root, W. L., 255 


Safety testing, 106 
Sample, 8, 299 
averages, 392 
drawn with replacement, 33 
drawn without replacement, 33 
ordered, 67 
partitioned, 71 
random, 299 
unordered, 67 
Sample description, 8 
space, 8 
finite, 23 
Sampling, from a sample, 117, 
problem, table of solutions, 84 
Schrédinger, E., 382 
Schwarz's inequality, 364 
Set, 9, 10 


182, 299 


463 


Single-member event, 23 

Size of an event, 25 

Size of a set, 10 

Smith, E. S., 245 

Snell, J. L., 128 

Space, 9 ` 

Spurious correlation, 388 

Standard deviation, 206 

Standardization of a random variable, 
373 

State of a Markov chain, 136 

Stationary probabilities of a Markov 
chain, 139 

Stationary sequence of random vari- 
ables, 419 

Statistical equilibrium, 139 

Steinhaus, H., 30 

Stieltjes integral. 233 

Stirling's formula, 163 

Stochastic matrix, 140 

Stochastic process. 264 

Stuart, A., 329 

"Student" (W. S. Gosset), 25 

Student's f-distribution, 180, 

moments, 211 

Subevent, 14 

Subset, 12 

Sums of random variables, 366, 371, 
382, 391, 405 

Supreme Court vacancies, 256 


5 
326 


Takacs, L., 306 

Taylor's theorem, 166 

Telephone trunking problem, 246 

Todhunter, 1., 29 

Transition probabilities in Markov 
chains, 137 

Tuple. 32 


Unconditional probabilities, 117 
in Markov chains, 129, 137 
Uncorrelated random variables, 362, 
367 
Uniform probabilit 


moments, 220 


y law, 180, 184 


Unimodal distributions, 213 
Union of events, 13 
а 1, 245, 307 


Uspensky, J. v, 3 


464 


Variance, of a probability law, 205 
of a random variable, 346 


INDEX 


Waiting time, exponential law of, 260 
Wallis, W. A., 256 


of a sum of random variables, 211, Waugh, D. F., 119 


366 
Variances, table of, 218, 220, 380 
Variation, coefficient of, 379 
Venn diagrams, 13 
Vernon, P. E., 78 


Waugh, F. V., 119 
World Series, 112, 353 


Young, G. S., 263 
Yule process, 267 


